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The book Bayesian Inference in 
Econometrics contains some of the major 
research contributions of the author. 
Unfortunately, his productive career was 
cut short by his most tragic and sudden 
demise on May 28, 1998 at the young age 
of 40. . 
The contribution of the author in this 
book lies in demonstrating immense value 
and wide applicability of Bayesian 
methods in econometrics and economic 
analysis at large. The Bayesian and . 
classical approaches to inference are. 
.compared and contrasted in detail. Some 
important results relating to the 
“admissibility” of estimators have been 
summarized. The posterior distribution of 
the “shrinkage” factors of the ridge 
regression has been derived and a 
Modified version of the principal 
component regression has_ been 
presented. Bayesian analysis of the linear 
' regression model with (i) auto-correlated 
disturbances and (ii) errors in variables 
_is an important feature of the book. The 
.Second aspect analysed deals with 
applications of Bayesian methods. to. 
certain issues in economic. analysis. 


Econometricians;. economists and 
Statisticians will fifid ‘a wealth: of: 
interesting material inde book. It is 
hoped that in the coming years many 
researchers will build uportthe. tuthor's’ 
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Foreword 


The present volume contains some of the major research 
contributions of Dr. Avanindra Narayan Bhat. His productive 
career was cut short by his most tragic and sudden demise 
on May 28, 1998, at the young age of 40. Dr. Bhat was a 
very perceptive and versatile scholar. He was an Associate 
Professor in the Centre for Economic Studies and Planning 
at Jawaharlal Nehru University, New Delhi from 1990 until 
his death. I had the good fortune of being his teacher during 
1977-79, when he pursued the M.A. Economics programme 
at the Delhi School of Economics specializing in 
econometrics. Subsequently, at the University of Michigan, 
Ann Arbor, U.S.A., he cultivated his interest in Bayesian 
econometrics under the supervision of Prof. Bruce Hill, wrote 
his doctoral dissertation there and continued his work in 
this area even after his return to India in 1989. Dr. Bhat 
was among the foremost Bayesian econometricians in India. 


For a very long time there has been a stark dichotomy 
between the classical (or frequentist) and Bayesian (or 
subjectivist) approaches to statistical inference, the former 
being the more commonly followed approach is that in real 
world applications in statistics and econometrics. A 
distinctive feature of the Bayesian approach is that it involves 
the systematic use of prior information in addition to sample 
information, through the Bayes theorem. In recent years 
the Bayesian approach has gained much ground. It has gone 
beyond merely replicating the well known results of the 
classical approach in Bayesian setting to working out 
solutions to problems that were once regarded as intractable. 
Monte Carlo integration techniques have enabled researchers 
to deal with some complex problems in the Bayesian context. 
Bayesian procedures have been found to be useful in solving 
many decision problems. Applications of the Bayesian 
methodology have become quite extensive and special issues 
of some leading journals like Journal of Econometrics have 
been brought out covering studies in economics and finance. . 


The contribution of Dr. Bhat in this volume lies in 
demonstrating the immense value and wide applicability of 
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nometrics and economic analysis 
at large. The Bayesian and classical approaches ae 
are compared and contrasted in some detail. Some 1mp ‘ 
results relating to the “admissibility” of estimators have on 
summarized. The posterior distribution of the ‘shrinkage 
factors of the ridge regression has been derived anda 
modified version of the principal component regression has 
been presented. Bayesian analysis of the linear regression 
model with (i) auto-correlated disturbances and (ii) errors 
in variables is an important feature of the book. Although > 
the general cases could not be considered on account of 
analytical difficulties, the special cases considered are of 
considerable interest. | 


The second aspect analysed in the book deals with 
applications of Bayesian methods to certain issues in 
economic analysis. A Bayesian expectations scheme using 
prior information has been suggested, and the familiar 
adaptive expectations model has been shown to be a special 
case. Labour supply under uncertainty has been analysed 
from the Bayesian point of view in the Expected Utility 
Maximization framework. 


Bayesian methods in eco 


Bayesian estimates of the consumption function based 
on the Permanent Income Hypothesis have been calculated 
using Klein’s data, to illustrate the methodologies proposed. 


Econometricians, economists and statisticians will find 
a wealth of interesting material in the book. Bayesian 
methods are making rapid strides. It is hoped that in the 


coming years many researchers will bu; , 
uild u 
good work presented in this volume. eas 
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4th April, 1999 K.L. Krishna 


Delhi School of Economics 
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Introduction | 


This study Proposes the application of the Bayesian 
standpoint and approach to economics and econometric 
methods employed in economic research. At the outset. we 
begin with a general discussion of the Bayesian standpoint 
as distinct from Bayesian techniques that will be repeatedly 
employed in this study. This demarcation is well brought 
out by de Finetti (1974).! The Bayesian standpoint is 
essentially a philosophy with an uncompromising adherence’ 
to real life problems. This philosophical stance is distinct 
from the Bayesian techniques that are simply applications 
of the hypothetico-deductive method due to Bayes’ theorem. 
Bayesian techniques generally rest on this philosophical 
stance and hence it is logical to discuss it at greater depth. 


The salient features of the Bayesian standpoint and its 
point of contact with other philosophies are excellently 
summarized by Brumat (1977) as follows: 


1. Eudaemonism: That everyone always pursues what 
he regards as the best available course of action. 


2. Coherence; Croce (1949) describes ‘Coherence’ as that 
“distinguishing feature of philosophy” which 
distinguishes between philosophers and non- 
philosophers, as “Non-philosophers are those who are 
not troubled by inconsistency or incoherence and do 
not trouble to escape it, philosophers are those who 
experience these troubles vividly.”? 

- Reunification of theory and practice: The Bayesian 
Standpoint rejects the theory versus practice duality. 
This reunification is nicely captured by Brumat (1977) 

The Bayesian standpoint eliminates the distinction 
between action and contemplation, between the practical 
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world of the will and the theoretical world of the 
intellect. 

4. Subjectivity: The Bayesian standpoint renounces the 
fantasy of certain and exact knowledge except in the 
realm of the tautological and accepts that we live ina 
world full of subjective opinions and expectations. 


These features demonstrate that the Bayesian standpoint 
tolerates divergent opinions while at the same time being 
critical of dogmatic ones. In addition, it demands rigour of 
each one in ascertaining the internal consistency of his own 
opinions and insists that opinions be anchored to reality. 
One of the main reasons that the Bayesian standpoint meets 
with resistance is its philosophical stance. The anti- 


philosophical stance is easy to criticize, as Brumat (1977) 
does, 


To begin with, tt ts obvious that everyone has a philosophy, 
and simply not being aware of it, not keeping it in sight 
where it can be scrutinized, should hardly be regarded as 
grounds for believing one’s philosophy to be superior to 
others.* | 
Methods representing the Bayesian standpoint together with 
Bayesian techniques derived from Bayes’ theorem will be 
jointly called ‘Bayesian’ methods. The focus of this study is 
in demonstrating the usefulness and applicability of 
Bayesian methods in econometrics and economic theory. In 
the rest of this chapter, we shall generally review Classical 
inference in econometrics and contrast it from Bayesian 
inference (inference employing Bayesian methods) as 
applicable in econometrics. In particular, the general linear 
model will be taken as an example of an important tool in 
econometric research and inference, and in this context both 
Classical and Bayesian inference will be evaluated in terms 
of their strengths and weaknesses. In addition, the scope 
for applying Bayesian methods to economic theory will be 
elaborated with the help of specific examples. 


1.1: Classical Inference in the Linear Model: A Review 


Economic theory enables the researcher to propose 
interesting economic hypotheses and issues that are vital 
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from the point of view of policy making. Typically, these 

hypotheses are presented in the form of a model with a 

special functional relationship among variables that are 

peing studied; frequently, such a relationship is specified to 

be linear. A model captures important economic 

relationships between key variables. Hypotheses in question 

are tested using available data and with the aid of 
econometric methods as applied to the general linear model 
(of regression). The cornerstone of Classical inference used 
in econometric research is the axiom of ‘correct specification’ 
that requires the researcher to specify a unique, exhaustive 
and small (relative to the number of observations) set of 
explanatory variables which linearly determine the 
dependent variable.° Other influences are assumed to be 
random and are summarized by an error term which is 
spherically normally distributed. The set of explanatory 
variables represented by the matrix X is generally assumed 
to be non-stochastic or if stochastic, fully independent of 
the unknown parameters. In addition, the matrix x'x is 
assumed to be non-singular. The coefficients of these 
independent variables in the model constitute the unknown 
‘parameters’. 


Using the available data, the econometrician proceeds to 
estimate the unknown parameters in the model as well as 
test different hypothesis concerning the parameters. 
Estimation and Hypothesis testing are the two main types. 
of statistical inference used in econometrics. Unlike the 
experimental sciences, ‘economic data’ are non-experimental 
in nature and hence, in estimating the unknown parameters, 
the econometrician is trying to identify relationships that 
exist in historical data. It is not possible generally for the 
economist to collect data on the phenomenon that is being 
investigated under controlled conditions, like conditions that 
can be replicated in a laboratory ! 


We have briefly discussed the linear model together with 
all the main assumptions. The unknown parameters are 
frequently estimated by Ordinary least squares (OLS), 
Providing unbiased and efficient estimates. The OLS 
Procedure proposes an estimation procedure that estimates 
the parameters by closely approximating to the data. Once 
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the estimates are obtained, the various hypotheses are tested 
using standard test Statistics such as the t statistic. the 
Chi-square statistic, and the F statistic. Since, this stud 
will examine the problem of estimation in greater detai] ie 
do not discuss and review both Classical and Bayesian 
hypothesis testing techniques. . 
| The principal justification of the OLS estimator is that it 
is a minimum variance unbiased estimator (M.V.U.E.) and 
attains the Crameér-Rao lower bound amongst the class of 
unbiased estimators. These desirable properties of the OLS 
estimator rest on the axiom of ‘correct specification’, which 
1s a very strong assumption to make. Leamer (1970) argues, 
[f the uniqueness and smallness requirements for the 
explanatory variables were accepted by economic 
researchers, we would find one and only one least-squares 
equation for each phenomenon (at least when the researcn 
is constrained to a single data set). If the exhaustiveness 
requirement were met, there would be little concern over 
outliers and autocorrelated disturbances. Quite the 
contrary, econometric research is characterized by dozens 
of equations all offering to ‘explain’ the same event, often 
presented by the same researcher. At the same time, there 
is a great concern over outliers and autocorrelation. Such 
behaviour represents a clear rejection of the axiom of 
specification and concomitantly casts considerable doubt 
on the inferential structure of that axiom.°® 


In turn, doubt is cast on the estimation procedure most 
commonly used, namely the OLS estimator based on this 
inferential structure. Once the axiom of specification is 
rejected, the problem of ‘misspecification’ crops up. This 
implies that the model specified by the researcher could be 
misspecified as long as there is a distinct possibility that 
the axiom of specification could be violated or rejected. In 
such situations, the Gauss-Markoff theorem would no longer 
be true although the researcher might not be aware of it. 
These situations crop up extensively in econometrics as the 
problem of ‘omitted’ variables and/or the inclusion of 
irrelevant variables. In the case of the omitted variables, 
the OLS estimator is biased and is no longer a M.V.U.E. In 


ee] 
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the case of included irrelevant variables, OLS is unbiased 
but not minimum variance unbiased. Classical inference 
based on the ‘BLUE’ property of the OLS estimator may not 
be justified. In addition, in the non-experimental situation, 
often one works with weak data, and has no control against 
extraneous influences. On account of both reasons, 
economists use, specification searches to be able to interpret 
the regression models from inadequate data. Leamer 
expresses the anguish of the economist cogently, 


For some reason, probably frustration over the inadequacy 
of the available statistical tools, economists have concluded 
that search techniques provide a useful means for 
interpreting their data. Such procedures have little 
grounding in theory, and have been subject to rather 
successful attacks in a number of articles.’ 


Some recent articles on model selection that express 
similar views and are critical of ad hoc procedures include 
Hill (1983), Freedman (1983), and Freedman (1986).® Thus, 
the econometrician employs various types of specification 
search strategies to provide a satisfactory interpretation of 
the evidence. This is in fact making use of uncertain prior 
information that exists in addition to the available data. 
Constraints that are most likely to be right are imposed on 
the unknown parameter space in an attempt to improve 
upon the estimation procedure. The processes of the different 
types of specification searches are excellently summarized 
by Leamer (1978), 


If the constraints were certain, they would have been 
imposed without testing. The researcher is, in fact, less 
certain than this. He feels that the constraint may be 
“approximately true”, but he checks with the data to make 
sure. If the constraint works, he will impose it; otherwise 
he will not. To put this another way, the researcher has a 
priori knowledge about some parameter or some linear 
combination. If the sample evidence ts sufficiently strong, 
he will disregard that information. Given weak evidence, 
he may prefer to use his a pnori estimate. By definition, 
then, the intent of an interpretive search ts to integrate into 
the data analysis uncertain a pron information. In the 
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psence of such information, no interpretive search should 
a 
be perfomed.” 
mation m 
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ethods used in econometrics do 
rmation systematically. On the 
dures involving two 


Classical esti 
not generally use ‘p 
contrary, they ar 
possibilities: 

1. Ignore all ‘prior’ knowledge and report only the sample 

evidence (or the likelihood function). 


Try to fit and refit the equation with various a priori 


2: 
likely constraints. 
Classical inference involves judgements that are either 
completely certain or completely uncertain. As Leamer (19778) 
points out, 


... The process of learning is a ‘herky-jerky’ reaction to the 

sample evidence, consisting of phases of complete 

disregard of non-sample evidence.'® 
Also, such ad hoc interpretive searches built on some prior 
information are relevant to the reader only if he knows this 
piece of ‘prior’ information and if he accepts this prior 
knowledge. The publication of just the output of an 
interpretive search is not useful since it is equivalent to the 
publication of a posterior distribution without either the 
sample result or the prior. A description of these search 
processes must reveal the implicit prior for it to be sensible 
and intelligible. 3 


1.2 : Bayesian Inference Applied to the Linear Model: A 
Contrast from Classical Inference 


Bayesian methods provide a ‘formal’ way to incorporate 
prior information. In addition to the trivial cases of 
overwhelming sample evidence or non-sample evidence, the 
Bayesian approach considers the non-trivial problem of 
MIXINg the two sources of information. Prior opinions are 
sae In response to the data evidence, resulting in a 
ae sat distribution. A Bayesian thus explores and merely 

ports that region of the parameter space favoured by the 
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searchco' enables one eae er eds to interpretive 
pret data generated by the 
regression process irrespective of the extent of 
‘multicollinearity’ in the data set or the lac 
degrees of freedom in the data. Classica] inference based on 
Neyman-Pearson tools for interpreting data evidence is 
generally not found adequate, resulting in interpretive 
searches being pursued by economists. Additional problems 
are created when due to either multicollinearity or degrees 
of freedom inadequacy, an unconstrained model yields large 
estimated standard errors on the set of unknown 
parameters. The Bayesian standpoint in the context of data 
evidence implies that the interpretation of the data depends 
heavily upon our own personal view of the phenomenon 
being examined, except in the case when the data are so 
strong as to overcome all prior opinions unambiguously. 
Economic data are ordinarily quite weak and prior opinions 
play an important role in the data interpreting process, be 
it Bayesian or Classical. These prior opinions and 
expectations about parameters can be formalised into a prior 
probability distribution. A formal Bayesian incorporation of 
prior opinion has three distinct advantages [see Leamer 
- (1970)]: | | ae | 
1. It forces the researcher into deep and potentially 
fruitful thought concerning the process he is analysing. 


k of adequate 


2. It rules out errors in the informal interpretive search 
of classical inference. The Bayesian approach ensures 
rigour and internal consistency. 


3. Formal priors can be communicated to readers. 


All the advantages of the Bayesian approach discussed 
are no longer that pronounced when the data evidence is 
strong relative to prior opinions. In experimental sciences, 
a lot of effort is spent on obtaining rich data by refining and 
repeating experiments. In such situations, Classical 
inference may be wholly adequate. In non-experimental 
sciences, however, we are often forced to work with a single 
inadequate data set. In these instances, a Bayesian analysis 
of sample evidence encourages frugal use of the data 
available and may prove to be a very valuable tool. 
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al situations in which prior knowledge 
prove upon the performance of 
Zellner (1985) points out, 


There exist sever 
can be incorporated to 1m 
estimation procedures. AS 

The point that I make to you, very strongly, is that non- 

Bayesians as well as Bayesians use a good deal of prior 

information in building and using models. Further, since 

the use of pror information is unavoidable, it should be 
used carefully and more formally than has been done in 


the past !") 


Prior information has been successfully incorporated in 
several econometric models, applied studies, and models in 
economic theory. Important amongst such studies are R.J. 
Shiller (1973), W.S. Cleveland (1974), J.F. Monahan (1983), 
S.J. Turnovsky (1974), R.M. Cyert and M.H. DeGroot (1974), 
V.K. Chetty (1968).!2 Futhermore, Savage and Friedman 
(198 1)3 show intimate connections between statistical theory 
and utility theory developed in microeconomics, providing 
thereby considerable axiomatic support to Bayesian 
inference and Bayes procedures that have been applied in 
economics and econometrics. | 


Bayesian methods could be usefully employed in models 
of economic theory, as seen in several of the above references. 
Models which try to capture uncertainty in decision making 
or the expectations of individual agents provide a favourable 
setting for the Bayesian approach. We hope to provide 
specific situations in the microeconomic context where the 
use of Bayesian methods enables one to model uncertainty 
and/or expectations utilizing information on the part of 
economic agents. 


1.3: Overview 


In this chapter, we have discussed briefly the foundations 
and implications of the Bayesian approach and the potential 
for its application to econometrics and economic theory. 
Chapter 2 will present some basic concepts and the 
analytical framework of Bayesian inference in relation to 
estimation theory in econometrics, Important results relating 
to the admissibility of estimators will be summarized. A full 
Bayesian analysis will be developed in Chapter 3 in the 
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: € ridge estimators will be derived 
by the assumption of a prior distribution on the unknown | 
parameters. The Bayes estimator will be derived and seen 
to be admissible. Also, the ridge estimator will seen to be 
equivalent to the Bayes estimator. A modified version of 
principal component regression will be discussed. Examples 
to illustrate the procedures will be given in less general cases 
where our theoretical results are approximately true. The 
methodology will draw upon the creative and insightful work 
of Hill (1977)"* in the context of the one-way balanced random 
effects model. Our analysis will be compared and contrasted 
to the work of Hill (1977). In Chapter 4, a multivariate 
transformation is proposed that is seen to be a crucial, 
intermediate step in obtaining a new, operational version of 
the Bayes estimator. In Chapter 5, other variants of the 
linear model will be considered and the analysis of the earlier 

chapter will be extended to these situations. 


In Chapter 6, applications will be considered to economic 
‘ theory. A Bayesian expectations scheme will be presented 
that will make use of prior information. A simple duopoly 
model will be presented where firms make use of prior 
information in their decision making. In Chapter 7, labour 
supply under uncertainty will be considered from the 
Bayesian point of view. Labour supply will be seen to be 
dependent on unemployment and other factors. In Chapter 
8, the problem of estimating the unknown parameter vector 
in the linear regression context when there is error in 
measuring the independent variables in addition to the fact 
that some variables may not be observed is discussed. 
Chapter 9 consists of conclusions and possible further 
extensions. 
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Elements of Bayesian 
Inference in Econometrics 


An important task of the econometrician is to make 
inferences about the unknown parameters from the available 
data set. Leamer (1978) defines Inferences as “a logical 
conclusion drawn from a set of facts.”! In statistical 
inference, the set of facts includes the data set X, anda 


conditional distribution f (X|@) that gives the probability of 


various values that the variables in X can take given the 
value of @. Given this information, Classical inference draws 
conclusions about the unknown parameter vector @. 
Bayesian inference is distinct from classical inference in 
that the set of facts, in addition to the data set X and 


f(x 0), contains a prior probability function f(@). The prior 


probability density function may be any non-negative 
measurable function defined on the parameter space whether 
it be integrable or not. The only additional requirement that 
we impose mathematically is the integrability of its product 
with the likelihood function. A classicist would argue that a 


distribution like f (X/0) is an objectively verifiable feature 


whereas f (@), using Leamer’s words, “is purely a figment of 
someone’s imagination.” 


Regardless of this point of distinction between the two 
approaches, analysis of the data proceeds in three stages: 

1, Summarization, 

2. Interpretation. 

3.. Decision Making. 


The stages of summarization and interpretation jointly 
constitute the process of ‘learning’ and based on this, the 
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decisions are made in the last stage. Unlike classical 
inference which lacks a formal interpretation phase and is 
only a method of summarizing data, Bayesian inference 
involves each of these three phases. The Bayes rule is used 
to ‘learn’ from the data given by 


F(X) F (0) 
F(X) 


The Bayes rule in this context tells us how the data influence 


F(AX) = 


the uncertain information about 9, captured by the prior: 


probability function, [(6). f (AX ), the posterior distribution 


of 9 given the data, is the basis of all decision making, and 
the Bayesian phase of interpreting the evidence X is to go 


from /(8) to (6.X). 


We discuss the probability foundations of Bayesian 
inference and present the concépts and ideas that are used 
and applied in Econometrics in Section 2.1. In section 2.2, 
Ridge regression and Principal Component regression are 
evaluated in a Bayesian framework in the context of the 
linear model. | 


2.1: Bayesian Inference: Its Probability Foundations and 
Applications in Econometrics 


Bayesian inference is implicitly based on the subjective 
viewpoint of probability, as distinct from the frequentist 
viewpoint of probability used in Classical inference. James 
Bernoulli (1713) defines ‘subjective’ probability as a degree 
of belief that the individual has for an uncertain event. 
Degrees of belief are used to order the likelihoods of uncertain 
events; under certain assumptions these degrees are 
consistent with the probability ordering [see DeGroot (1970), 
Chapter 6 for further details].2 Ramsey (1926)° introduced 
the idea that probability is the person’s willingness to act in 
decision making situations in which eventual rewards are 
uncertain. This constitutes a second approach to subjective 
probability wherein the individual is confronted with bets 
that yield uncertain rewards. These ‘wagers’ or ‘bets’ are 
internally consistent by the principle of ‘Coherence’ which 


ssa rican at nea aa eae Aga NR 


— 
YS SASS 


a TS ese EEE eS 


Elements of Bayesian Inference in Econometrics 15 


ensure that the opponent cannot bid in such a way that he 
is a sure winner. Probabilities thus can be measured in 
terms of these betting odds, which can be shown to obey 


the fundamental axioms of probability which are stated 
below: 


Let U be the universal set. A function P that links every 
subset AcU to areal number P(A) is said to be a finitely 


additive probability measure on U provided it satisfies the 
following three axioms: 


Axiom 1: For every ACU, P(A)20. 
Axiom 2: P(U) = 1. 
Axiom 3: If A and Bare disjoint, then 

P (AUB) =P(A)+ PIB). 


This shows that ‘degrees of belief?’ measuring uncertain 
knowledge about events follow the probability axioms stated 
above. As Leamer points out, 


Anyone who makes decisions under uncertainty in a 
rational way will act as if he had degrees of belief that 
obeyed the probability axioms” | 


Although Ramsey’s approach to subjective probability is 
interesting, it has not been pursued further. A further 
distinction exists among ‘subjectivists’ between ‘personalists’ 
and ‘necessarists’. Personalists such as de Finetti, Savage 
and Ramsey argue that the quantitative measure of 
knowledge varies from individual to individual, and hence 
one is free to assign quantitative measures to his beliefs as 
long as they are consistent with respect to the probability 
- axioms. Necessarists like Jeffreys and Keynes argue that 

‘probability’ is a rational degree of belief regarding some 
uncertain probability associated with an event. Although 
this approach is interesting, it has not been pursued further. 
The former view seems to be more realistic than the latter 
view. The result due to de Finetti (1937)5 would suggest 
that ‘personalists’ will rationally constrain their personal 
probabilities to be consistent with the relative frequencies 
of these events if they are publicly known. | 


a 
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The ‘measurability’ of the degrees of belief is another 
problem in all inference based on subjective probability. 
Bayesian inference proceeds as if these degrees of belief are 
correctly measurable and at the same time allows for the 
consequences of measurement error. The central result 
governing all Bayesian inference is Bayes rule (or Bayes’ 


theorem) 


P(A |B) = So if P(B)>0O, and 
P(B|A)P(A) ae 


aa (B) > 0. 


P(A |B) = 


The principle of coherence is used to prove this result.° Bayes 
rule implies a complete theory of learning in that the personal 
probabilities of events is updated by conditioning on actual 
events. 

In this section, we proceed to discuss and review the 
basic concepts of Bayesian inference that are useful and 
are applied in the estimation of unknown parameters in 
linear regression models. The typical problem of estimation 
faced in econometrics is a point estimation problem with a 
conveniently chosen quadratic loss function. Classical 
inference compares estimators and estimation procedures 
in terms of their ‘expected loss’ or their mean square error, 
and determines whether a particular estimator is admissible. 
Formally, admissibility of an estimator is defined as follows: 


1. Given the estimator a(y) of some parameter f, the 
expected loss conditional on fis called the risk function 


R(f, a). Thus, | , 
R(B, a) = E[(f - a)'Q(8 - a}. 


In the special case where the matrix of weights 1s an 
identity matrix, the risk function of fis equivalent to 
the trace of the mean square error matrix of £. 


2. Anestimator a,(y) is said to be inadmissible if there 


exists a(y) such that R(B,a,)< R(Z, a;) for all £, with 
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strict inequality for at least one value of £. Otherwi 
: , é is 
a,(y) is said to be ADMISSIBLE.? : 


However, admissibility of an estimator is not highl ful 
since the class of admissible estimators is eae 
Classical inference essentially excludes inadinissible 
estimators, leaving a whole class of admissible estimators 
from which to choose. ‘Prior’ information can be used if it 
exists, to choose among admissible estimators. A Bayesian 
analysis of the data proceeds by incorporating prior 
knowledge with the sample information obtained from the 
data that yields Bayes estimators. The difficulty with a fully’ 
Bayesian analysis is the fact that a prior distribution of the 
unknown parameters of the model needs to be specified. 


Before proceeding to discuss a complete Bayesian analysis 
of the estimation problem, we discuss a framework in which 
to analyse the interpretive search estimators which further 


highlights the contrast between the Classical and Bayesian - 


approaches. The imposition of constraints on parameters 
in the parameter space leads to a family of ‘constrained’ 
estimators which are the interpretive search estimators 
obtained in Classical inference without any formal theory. 
In general, we can think of the set of constraints as 


RG =0 (or R6=r) 
where fis the unknown parameter. The following theorem 


(stated without proof) due to Leamer (1978) shows that the 
family of constrained estimators is an ellipsoid. 


Theorem 2.1:2 A constrained least squares estimate 
computed subject to a set of constraints 
R B= 0 lies on the ellipsoid 


(2 ~ b/2)'(X'X)(B -b/2)=0' X'X b/4, 
where b is the unconstrained least squares 


vector. Furthermore, any point on this 
ellipsoid is a constrained estimate for some 


R.m 
The feasible ellipsoid consisting of all the constrained 
“SUmates is a translated likelihood ellipse travelling through 


18 Bayesian Inference in Econometrics 


the origin and b, the ordinary least squares vector. An 
‘interpretive search’ estimator is a procedure for selecting 
points from the set of feasible points on the ellipsoid. More 
generally, this is seen to bea weighted average of points on 
the feasible ellipsoid written as 


B= Jp w(R)B(R) AR, 
J pw (R) dR =1, 


where /;(R)is a constrained estimate, and w(k) is a weight 
function. In practice, a finite subset of the feasible points is 
considered, determined by a set of k linearly independent 
constraints. Three steps serve as the basis for formulating 
the interpretive search estimator: | | 
1. An origin is selected which is the origin of the feasible 
ellipsoid generating the family of constrained estimates. 
2. Aset of k linearly independent constraints is identified 
which determines the coordinate system. This shrinks 
the ellipsoidal continuum of points to a set of 2* points. 
3. Applying the weights given by the weighting function 
w,(y, X), depending possibly on the data set, to the 
family of constrained family estimates yields the 
interpretive search estimator. 


Classical analysis of interpretive search strategies does 
not offer much on the choice of the origin or of the coordinate 
system. Only two points on this feasible ellipsoid end up 
being chosen, namely, the origin or the point of least squares, 
b. Also, only special weight functions are chosen. In contrast, 
Bayesian analysis offers more in this regard. The mean of 
the prior distribution on the unknown parameters would 
correspond to the choice of the origin of the feasible ellipsoid. 
In addition, this determines the coordinate system. All that 
remains is thus to specify the weight function which yields 
the interpretive search estimator. We apply this framework 


of interpretive search estimators and evaluate these 
- estimators from the Bayesian standpoint. 
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Several estimators have become popular in the 
econometrics literature and are widely used as ‘biased’ 
estimators, all of which are considered as alternatives to 
the OLS procedure and which attempt to remedy the 
deficiencies of the OLS estimator. Chief among these are 
Ridge regression [Hoerl and Kennard (1970)], Stein rule 
regression [James and Stein (1961)] and Principal 
Component regression [Massy (1965), Greenberg (1975), 
Johnson, Reimer, and Rothrock (1973), Judge et. al, 
(1980)] .? We shall focus on Ridge regression and Principal 
Component regression in the context of the linear regression 
model used in econometrics and present them as interpretive 
search estimators and evaluate them in a Bayesian 
framework. 


_ 2.2: Bayesian Evaluation of Ridge Regression and 
Principal Component Regression 


The linear model commonly used in econometric 
modelling is 


"y= XBte (2.1) 
where: 

y : T x 1 vector of observations on the dependent 
variable 

xX : T x K matrix of explanatory variables, all of 
which are assumed to be stochastic, distributed 
independently of (f, o”); the matrix xy is 
generally assumed to be nonsingular. 

B : K x 1 vector of unknown parameters of the 
model 

EX : T x 1 vector of spherical disturbances such 
that 
E (eX) = 0, 
E(ee'|X) = 0° L. 


This implies that the disturbances are homoscedastic and 
lack any serial correlation. The unknown vector of 
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parameters is generally estimated by the OLS procedure 


given by 
| B(O) = (XX)? X'y. 


We will consider two alternatives to the OLS estimator. 
The first alternative to be considered is the Ridge regression 
estimator. The second estimator to be considered is the 


Principal component estimator. 
(1) Ridge Regression: The ecvetaged dee PSUS tor (GRE) 


is defined as 
B(K) =(X'X + K)'X'y, 


Where K = Diag (hk Koy eg ha ees Rie with k, > O for all 
t= 1, 2, ..., K. The ordinary ridge estimator (ORE) due to 


_Hoerl and Kennard (1970) is defined as. - 
B(K) =(X'X + kIy' X'y, k>0. 

In the special case when k, = k for all i, the GRE reduces to 
the ORE. The GRE f(K)can be seen to be derived from a 
prior on f# of the form N[0,2,] where 2X, =.Diag 
a ee 

We will prove in the following theorem that the posterior 
mean of f is eee 
E[B| y, X)=(X'X +07 Lp') '(X'X)B(0) 


given the data and the variance 0”, ¥, (the matrix of prior 
variances). The theorem and the proof follow Theorem 3.9 
. of Leamer (1978).'° 


Theorem 2.2: 
Consider the linear regression model in (2. 1). Assume a normal 


prior on B which is N[#, 2). Then for a squared error loss 


mee the posterior mean we B is given by 


E| Bly, x o* Zpl= wety X, K)B(0) + (I -wply, X K)\u 


Ps: aati eT ee 


Elements of Bayesian Inference in Econometrics 21 


where | 
W p(y, X, K)=(X'% +02 Le a (X x). 


Proof: Considering the linear regression model in (2.1), 
the likelihood function is written as 


L(p,o*) x(a?) "/? exp -(y- Xp) (y - XA) / 20”. 
The normal prior on 7 is written as 
(6) | Ep]! exp ~ (8 ~ w)' Zz (B- w). 
By Bayes’ theorem, the posterior density of 8 given o? and 
ae is . 
p"(6| y,X) x exp- [(XB)'(XB) / 20? - 2f'X'y / 20? 
+(B- pw) DZ) (B-w)). 


On regrouping the terms and using an identity for quadratic 
forms [see Leamer (1978), Appendix I, (T10), p. 324), the 
posterior is rewritten as aoa 


p'(B| y, X) x exp- B- bp) Bp -b"), 


where 


+t 


b= Waly, X, K)B(0)+ (I - we(y, X, K))ue 


B=(X'X + a as Ts 


Given o” and 2g, the posterior density of p is multivariate 
normal with mean b”™, 
This proves that the posterior mean of 8 is given by 


EIA X, 0°, Zp] = b" = wely, X, K)B(0) + (1 - wely, X, K))u. 


In cases where w =O, as assumed in our case of the ridge 
estimator discussed above, the posteri lor mean reduces to 


E{f y, X, o * Lyle b b™ = wry, X, K)6(0). a 
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The mean of the prior on # determines the seeciies 
ellipsoid and its origin on which the family of cons sane 
estimates lie. The feasible ellipsoid continuum 1s reduce 
by the imposition of k linearly independent constraints sis 
by Ig = 0, which determines the coordinate system. e 


GRE (or the ORE) then proceeds to weight the prior mean 
(b° = 0) and the OLS summary of evidence f(0) by the 


weighting function Wply, X, K) to give 


AK) = wey, X, K)A(0) + [1 - wely, X K)] 0. 
In cases where the prior mean is distinct from the origin, 
say b’, the ridge estimator is generalized to include a non- 


zero prior mean and is written as 


AK) = wely, X, K)B(0)+[ -wely, X K)) B. 
Both variants are clearly seen to be interpretive search 
estimators. | 
(2) Principal Component Regression: 
The principal component estimator discussed by Massy 
(1965) and Judge et. al., (1980) is defined as"? . 


fe = (A, A; )B(0), 
where A, is a submatrix of A and A satisfies 
AA SlaAA, WXKAZ A, 


where A = Diag (4), 4y,...,4,), with 4, 24,...24, 20. The 
matrix A=(A,: A,) where Aisa kK x K matrix; A,isakxK, 
matrix and A, is a K x(K - K,) matrix. An important result 


for a Bayesian interpretation of the Principal component 
estimator is that given a spherical prior with mean 0 anda 
covariance matrix J, the contract curve consisting of all the 
modes of the posterior distribution is a weighted combination 
of the (k + 1) principal components.'? The contract curve 
consists of all the posterior modes that cannot be improved 
ae and constitutes a Pareto efficient set. The prior mean 
i dias ne the feasible ellipsoid on which the 
pa emponents ie. These are reduced to k points by 
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the set of linear constraints, I B = 0. This determines the 
coordinate system. | ; 

The principal component estimator then proceeds to 
weight the origin and the OLS point with the weighting 
function depending on the set of included variables in the 
regression. This is expressed as 


Bec = (A, At) B(0) + (Ap A})(0); 


the matrix of coefficients (A,A;) is determined once the set 


of variables to be included in the regression is determined. 
The principal component estimator is thus seen as an 
interpretive search estimator. : 


The admissibility properties of estimators are defined next 
and in doing so, we state (without proof) the main result of 
Bayesian inference as applied to estimation theory. 


- Theorem 2.3:'° Given a general quadratic loss function, 


(2, B)=(8 - BYOB - A), 
where Bis the K x 1 vector of unknown parameters, Q is a 
symmetric positive definite matrix, 8 is any estimator of | B. 
Given f, the expected loss conditional on f is the risk function 


represented by R(, 3). The Bayes risk of is then defined 
as 


By; = ERB, A)], 


where the expectation is taken with respect to the prior on f. 
Then the following two results hold: 


1. Let fly) be the posterior mean of 8. Then f(y) minimizes 


Ell (B, B )| and ts the Bayes estimator of 8 regardless of 
the choice of Q. 


2. In addition, uf a ‘normal’ linear regression model and a 
Proper prior on B are assumed, then the posterior mean 


Bly) is an admissible estimator. r 
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These results will be used in evaluating the admissibility 
of the estimators discussed. 


(1) For the normal linear regression model and a proper 


prior on # assumed to be N(O,2,], where Ly, = 


Diag(o%, weg pe ), the posterior mean of f as seen before ig 
El fly, X, 07, Xp] = B(B) =(X'X + K)"(XX)A(0), 


where K =o” ara By the application of Theorem 2.3, the 
Bayes estimator of @ for a squared error loss function is 
B(B), which is also admissible given o7, > g- Asimilar result 


is also discussed by Lindley and Smith.'* Thus, we see-the 
Bayes estimator and the GRE estimator are equivalent. That 
is, given K, 


B(K) = B(B) =(X'X + KY '(X'X)B(0) 


is the posterior mean and hence the Bayes estimator of £. 
Taking the expectation with respect to K, 


BA) = EX'X + K)| y, X\(X'X)A(0) 
is the Bayes estimator of 7. | a 


Empirical Bayes procedures proposed in the literature 
attempt to estimate elements in K from the sample. This 
procedure is crude and no longer yields theoretically 


desirable properties for the estimator B(K ). This is a 


drawback of empirical Bayes procedures. A fully Bayesian 
analysis will be proposed to obtain E[(X'X + K y "ly, X]}, which 
will enable us to compute the estimator #(K). 


(2) Principal component regression is seen to be a special 
and a limiting case of the Bayes estimator. In terms of the 


normal linear model with a proper prior on £ ~ N[0,o ; 1), 


the Bayes estimator for fas seen earlier is 


B(K) = (X'K + KY 1(X'X)B(0), 


j 
\ 
( 
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given k= (0? [o;, ae y, X. In terms of the reparametrized 


yector @=AP and Z= X A, the Bayes estimator for a is 


a(K)=(A+ Ky" Aa(0), 


| 
given K = o” fe y, A; for alli. The prior on a is N{0, | o*1.] | 


Ao 7 | | 
Thus, a (K) = = ae a; (0) for all i =], 2, uu, BK eiven | 


o’, os ’ Ai Y. 
The principal component estimator as seen earlier is written 
as ; oo : | 
Bro =(AAi) BO). 
For the reparametrized model, the principal component 
estimator of a is ~ - | 
so that on simplification, we have 
.  _ | @(O),ifteL, 
Apo =4.. 
0. atiew 
Case A. Following Oman (1978)'5, the principal component 
estimator is seen as a limiting case of the Bayes 
estimator. Consider the parameter space Q,y given 


by Qy.={(o, B): 0 > 9, B'B/o* < M}. If for the set of 
included variables in the regression, J, a7, + for 
allie I, and for the set of excluded variables in the 
regression, Jd, a +0 for allie J, then 


ae a(0), mee I, 
0, ified, 
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ipal component estimator and is seen to 
f the Bayes estimator. For the unbounded 


which is defined by a, ={(c,.2): 


which is the princ 
be a limiting case © 


parameter space, Or 
o 20, Bip/o” <x}, then in the unlikely event that o = O, 


the Bayes estimator would collapse to the principal 
component estimator. Once more, the principal component 
estimator is seen to be a limiting case of the Bayes estimator 


as o2 30. ForQ,, the principal component estimator is 


seen to be a limiting case of the Bayes estimator without 
ie : 2 
making any explicit assumptions on the prior variance Op. 


Principal component regression assumes an asymmetry 
with respect to the prior knowledge assumed by the 
investigator regarding the unknown parameters to be 
estimated. For the set of included variables J, no prior 
information exists about how the dependent variable is 
related to the variables in J and at the same time, one is 
absolutely certain that variables in J are to be excluded 
from the regression. The switch from absolutely uncertain 
information to certain information is too sudden. A smoother 
way is proposed below. 


Case B. Prior knowledge on a is of the form @ ~ Niu, a] ]. 
Now, w=(ty, ---» Hx) and 
_|@, ifkel, 
MK 0, ifke J, 


where @ +0. The Bayes estimator in this case is modified 
and written as 


2 a 
“ | [4,03 /(o? +034; )]a;(0)+[o? /(o2+4,03)]u,., for all i € J, 


. for all i € J, 


a 3 
given o°,0,,y,4; for all i. In the special case when the 


eet squares estimator @;(0) exactly equals @, the Bayes 
estimator is identical with the principal component 


Leste eae 


gee Pa Fn ae a acaal SSS 


en eae 
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estimator. The corresponding estimator of # is recovered 
from the inverse transformation @ = Aa. Also, note that 
this estimator is a special case of the ridge estimator 
evaluated above in a Bayesian way. This might be too 
unrealistic a case to expect in practice, and hence we propose 
a modified principal component estimator that will be a 
Bayes estimator and admissible 


Case C. Modified Principal Component Regression: 


The prior mean on o is assumed to be w=(,-.., 4x) and 


_{@, ifkel, 
Me 10, ifke J, 


where @ #0. The Bayes estimator of a is 


n Eln;| y. X1a;(0)+E(Q-m)| y, X]@, for allie], — 
a;= 
0, for alli EJ, 


where 7; = Aioe, / (A;0%, + o”). The corresponding estimator 


of # is recovered from the inverse transformation £ = Aa. 
Also, note that this estimator is a special case of the ridge 
estimator evaluated above in a Bayesian way. 
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Bayesian Inference in 
the Linear Model | 


We consider a pure Bayesian approach to the problem of 
estimation of unknown parameters in the linear regression 
model. Priors are assumed on the unknown parameters and 
estimation proceeds with the incorporation of information 
embodied in these priors. The variance of the error term in 
the linear model and the prior variances are assumed to be 
unknown. The analysis becomes purely or completely 
Bayesian if we assume priors on the unknown variances in 
the model and then proceed to estimate the unknown 
parameters. Estimation of the parameters is done by 
combining the sample information with the prior knowledge 
on the unknown parameters and variances in the model. 


Bayes estimators are derived for the unknown parameters 
of the models. Two important estimation procedures are | 
considered, namely Ridge regression and Principal 
component regression which are related to the Bayes 
estimator. The ridge estimators are seen to be derived from 
a class of proper priors assumed on the parameters; a 
modified version of the principal component estimator is 
obtained from a class of proper priors. These estimators are 
seen to be Bayes and are admissible. 


' The linear model that is assumed has been discussed at 
great length in Chapter 2. In addition to the assumptions 
already discussed, the disturbance vector will be assumed 
to be normal together with a normal prior on £. The model 
that will be considered is written in its most general form as 


y=XBt+eE (3.1) 
where: | 


— FI Mh TTR PN et 
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xX Tx K matrix of explanatory variables assumed 
to be stochastically distributed, independent 


of (£, 07). 
EX : Tx 1 error vector assumed to be normal with 


mean zero and a scalar covariance matrix o7J, 
given X. | 


B : Kx 1 vector of parameters assumed to have 
a multivariate normal prior with mean uw and 


a covariance matrix 2 ;- 


The analysis of this model will proceed in a Bayesian 
framework by expressing the uncertainty associated with 
the unknown parameter vector # in the form of a prior. 
Specifically, a multivariate normal prior is assumed on /. 
This viewpoint is consistent with the following: The 
investigator believes # to be truly randomly distributed with 


the specified multivariate normal distribution. If # were | 


random, then the true prior might not always be known. As 
Berger (1982) points out, “Usually, only subjective 
approximations to the true prior can be constructed.” 
Such a model in the Classical framework would translate 
into a random effects model as distinct from a fixed effects 
model although such a distinction is not very valid in 
Bayesian analysis. In the classical framework , in a random 
effects model, # is assumed to be random rather than an 
unknown constant vector. If 8 were assumed to be unknown 
and fixed, the model would be a fixed effects model. However, 
with 8 being assumed to be random, this would be called 
‘random effects model’. Inference would be discussed in the 
context of a random effects model wherein a prior is assumed 


on the vector of random effects # in the model. [See Scheffe 
(1957), p. 6]. 


Description of the Transformation: This general model can 
be reduced to simpler models using a transformation that 


simultaneously diagonalizes (X’x)! and Ly. The 
transformation involves a matrix that will depend on the 


uncertainty in 2, and on the data X. This enables us to 
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transform the parameter space of # to the parameter space 
of a. Applying this transformation, we obtain a simplified 
model which will be studied in detail. Consider the model 


specified in (3.1) and assume that both Ygand x’x are 


positive definite. For fixed and observed X, there exists a 
nonsingular, random matrix A (K x K) which depends on x 


and L,; the randomness in the matrix A comes from the 


uncertainty in 24, as the entire analysis will be conditioned 
on the data X. This matrix A is such that? 


A(X'X)'A=I 
and 
A’ 2p A= Das 
where », = Diag Gat Sie a.) and a, for all i are the 
roots of the determinantal equation| LB =O, (X'X)"| = (), 


Since Lg is positive definite, we choose og, >O for all 


2 2 2 ° ° 
and og, > Fa, >> Fa,: These characteristic roots o? are 


defined to be the characteristic roots of 2g in the metric of 
y'x-1. [See Anderson (1984), p. 308]. This simultaneous 


diagonalization of (X'X yl and 2, is based on the following 


theorem which is stated without proof:* Consider a positive 


definite matrix A and a positive semidefinite matrix B. Then 
there exists a nonsingular matrix F such that 


F'BF = Diag (A,, vey A») 


and 
F'AF =I, 
where 4, 2 4)...24, 20 are the roots of the equation 


|B - AA = 0. 
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If Bis positive definite, then, 4; > O for allt. 
The model in (3.1) is now written as 
Y=Zat+eE (Siz) 
where : | 
Z=XA'' and. ZZ =I, ,, 
a=A'Zp. 


The model (3.2) is obtained from model (3.1) by a 
reparametrization of the former model that involves the 


transformation matrix A, which depends on X and 2, as 


described earlier. Our entire analysis will be conditional on 
the dataset X and hence this model (3.2) will also be 
conditional on X. The randomness in A will be due to the 


uncertainty in X,- Hence, the prior assumed on (B, Xs) 
induces a prior distribution on (a, X,)which depends on 


the transformation. The prior on % ps completely determines 


. the prior on >», given the data xX. We obtain the prior 


distribution on a as follows: Elelx ga X)=A'n, Cov(alx ee X) 


=A'L,A=2,- Given 2g and X, the transformation matrix 


-is completely determined and hence, the prior on a is 


NIA £5205 | 


This analysis would not be complete if it did not allow for 
possible misspecification of the priors assumed. We seek to 
make the analysis more complete and robust by 
incorporating in the analysis, the induced priors on the prior 


variances a, for alli, as these priors allow for a variation 
in the elements of 1, and correspondingly in 2, capturing 


any misspecification in 2X, and hence in the prior on f# for 
a specific realization of X. At the same time, the problem of 


misspecifying priors on oy for all i could still persist.* 


i 
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Sample evidence is then combined with prior informati 
on the unknown parameters to yield the pester: 
dis ta punen of the unknown parameters by the application 
of Bayes theorem. This will enable us to obtain the posterior 
distribution of the shrinkage factors of the ridge estimators 
and modified principal component estimators. Using a Taylor 
series expansion, the posterior mean of these shrinkage 
factors is expressed as a ratio of an infinite sum of 
hypergeometric functions. This enables us to obtain Bayes 
estimators of the unknown parameters of the model (which 
are also admissible) in some special cases. Important special 

cases of this kind will be considered. 


3.1: The Posterior Distribution of (o?, 0? y sey oes ) 
; ; 1 : K 


' The focus of this section will be to derive the posterior 


distribution of (eer yao.) in the context of the linear 


model being considered given by 3.2. i 
Next, we define shrinkage factors in terms of the variance 


components (07 0%, Sie Tiel The joint posterior distribution 


f these shrinkage factors is obtained. This will enable us to 
obtain the Bayes estimator of a and hence that of f. We will 
begin with a brief review of the Bayes estimator of a that 
has been discussed in Chapter 2. 

The Bayes estimator of a for the model in 3.2 is written 


as 
&(B) = Elaly, X, Up)=(ZZ +07 Z,)"* 
(Z Z)a(0) + [1 -(ZZ +07 D,)(ZZ)IA'u. 


(This has been described in Chapter 2]. In terms of the 
reparametrized model (3.2), the GRE of a is written as_ 


Q(K) = (ZZ +K)'(ZZ) &(0) 
given K, X, Digs y, where K = a* 5 and the OLS estimator 
of ais g. Thus, 


a,(K) = [02 /(o2, + 07)\a;(0), for alli=1,2,..., K 
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X, y, is the ridge estimator of a,. This follows 


2 
given 7 > Lp: 
from the fact that ZZ =. I. 

For the case when the mean on Band hence On @ is zere 
when p= 9, the Bayes estimator coincides with the 


a(K)=@(B) and hence the Bayeg 


1.€.,; 
ridge estimator, that iS, 
estimator of #, say AB), equals f(K). In the case that the 


prior mean on B is not equal to zero, the Bayes estimator 
equals a modified yersion of the ridge estimator as showy 


in Chapter II. Without loss of generality, it will be assume 


in the rest of the analysis in this chapter that the prior 
mean on f is zero, that is, w = O. This is a reasonable 
assumption to make in some econometric models, as the 
unknown parameters can vary between large negative values 
and positive values. 4 can take on non zero values in other - 


models and this analysis can be easily applied to these 
models. However, since the focus of this section 1s ridge 


estimation, we will analyse the case where the prior mean 


is zero, which corresponds to the ridge estimator as 
commonly presented in the econometrics literature, Thus, 


A 


&,(B) = &;(K) = (02, / (02, + 0”)] &;(0) for all i =1, 2, ae 


given o”,,,X,y. Hence, the Bayes estimator of q; 1s 
written as 
G;(A) = E(oz, /(o2, +07) | X, Lg, y]a; (0) for allt 
This expectation will be later evaluated (in this chapter] 
without conditioning on L,; this will allow for variation 12 
o? and o2 for all i | 
Alternatively, we could have analysed the untransformed 


model given in (3.1) and directly evaluated the shrinkage 
factors as El(X'X + 1G ie 


y, X] which would enable us to 


obtain the Bayes estimator of f. The term (X'X + K ie could 


be expan 
panded using a power series and the expectation could 
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be obtained by making assumptions on K. However, in this 
analysis, the approach has been to transform the model 
and obtain the expectation of the shrinkage factors and the 
estimator of a and then recover the estimator. of 2. The priors — 
assumed on the variance components are explained next. | 


We shall derive and study the posterior distribution of 


[o2 /(o3,+0°)] employing prior knowledge on 
Se. tt. The inverted gamma prior is assumed on 
o2. The ‘prior’ on a, jae oF. is induced by the prior that is 


assumed on L,- This prior depends on the data X and 


represents an updating of the investigator’s beliefs about 


ee: Bs ose 
05,10 gr Og, given X. The prior on g? is from the inverted 


gamma family given by the class P: 


p'(o?|A, Co) « (02) 7/21 exp[-Cy / 207], 


where 1>0,C, 20. An inverted gamma prior on a non 
- negative random variable X with parameters C,,A, implies 


that C,/X is distributed as a Chi-square with A, degrees 
of freedom. For non negative variables that can vary in the 
range [0,»], the inverted gamma distribution is desirable 


as a prior as it has a fat tail allowing for greater variation in 
the variables being studied than other priors such as the 
gamma distribution. | 


The prior assumed on 2, is an inverted Wishart 
distribution whose density function is written as 


~(u+K+1)/2 


f[Xp| La.» Kx [Ly 


exp- 1 / 2(tr Le Dues ls 
where v, K are the parameters of the inverted Wishart 


distribution and ¥,, is a symmetric, positive definite matrix. 


For priors on a covariance matrix, it is reasonable and 
conventional to assume an inverted Wishart prior. [See 
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DeGroot (1970), pp. 178-179]. This in turn induces a prior 


on the characteristic roots o? of 4 in the metric of (X x)? 


which we will derive as follows. The characteristic roots o? : 


as seen before, satisfy the following equation: 
Vigi- Og(X Xx) = 0: 


From the result and the theorem stated before, there 
_exists a nonsingular (K x K) matrix C such that 


C(xX'xXy'C=A! 


and — 


Ce ae, 
By definition, the elements of A are the eigenvalues of (X'X) 
in the metric of ae Now, since » p is inverted Wishart, | 


therefore, ae is Wishart with parameters given by uv, K, 


and 7}. [See Zellner (1971), p. 395]. Thus, C7} 27 C' is 


Wishart with parameters v, K and I. [See Anderson (1984), 
p. 162]. Thus, the symmetric matrix H defined by 


HC » B C is inverted Wishart with parameters v, Kand I. 
It is easily seen that the symmetric matrix B defined as 


Bein 


has its characteristic roots by the matrix ¥ ae As reasoned 
above, B has an inverted Wishart distribution with 
parameters v, KandA- 


Starting with assumed prior on D B we can obtain the 


induced prior on the characteristic roots a for all i. We 


express the prior induced in the general case in the form of 


a theorem, and then consider important special cases 
separately. 
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Theorem 3.1: 


Consider a matrix B (K x Kk) which has an inverted Wishart 
x distribution with parameters v, K and A. The eigenvalues of 


B are given by the diagonal elements of of X, for alli. The 


eigenvalues of B are ordered as Cg SO, Suis o%,- Then, 


the joint distribution of the ordered eigenvalues 
of > oz, Se oR. >0 of Bis given by 


| PCy 1 Fgs os Fay) | Ylo2, POA 
i=] 


AP 0/205,40)x] Jd /o%, -1/02,)) > 
i<j i=l x(i) 


CAC, J/iCI, | 


where: 

x(j) | : a@notation for a partition of the number j; a 
partition of weight ris a set of r positive 
integers (j, Jos «++; J.) such that Ds (ii) = j. 

j i=] 

C.(Z) : the zonal polynomial corresponding to the 
partition x (j), which is a symmetric 
homogeneous polynomial function of the 
latent roots of Z, of degree j. 

a) a diagonal matrix with elements 6. for all 
i: 5=(A,- 4) /2 

qe 7 : ts the K dimensional identity matrix. & 
Proof : 


Since B has an inverted Wishart distribution, B- 
has a Wishart distribution with parameters v, K 


and A“! with eigenvalues given by iia Using the 


Sh a in in is 


eet ie aie aan ero pire igs 


2 2 
Transforming to the (0¢,; 0g,» F 
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result stated in Johnson and Kotz (1972), p. 192, 
we can obtain the joint distribution of the 


characteristic roots 1/ os, for all ias 


PI 8 5555 1/ a je]] ie oo W/2 exp- (v [202 Ao) 
. i=l 


[] ase, -1/03 IXY C6) Cee TWECy x). 


i<j i=l x(i) © 


2) space, we obtain the 


ee 2° '2 ae 
distribution of (09,,02,»+-» Ta,) as 


K 
p'(o oF Ge) oc I] (02, jak aes exp- (v /, 20%, Ao) 


Qa? Ao? dha ‘ 
t=] 


<P] 0/02, 1/0210, Y, CelC La") Cx). 


i<j i=l x{i) 


: 2 )-2 
Note that the Jacobian of the transformation is | | (o7,) 
| eh : 3 i=l 


which yields the resulting distribution. } 
From the result of the theorem stated above, we can 


: bie 46 : 2 OF 4s 
obtain the joint prior of (a3, ; a ,+, Og,) in the general case, 
as 


| aren | 

a ae: 2. -(v-K-1)/2- 

P'\Ga5Fo,0% o%, de] | (o2,) a ae a exp - (v/20%, Ap) 
i=l 


xT] O/o%, -1/ 02, )Fol5, Ea) 
1<Jj 
where 


Fy =>) Dj Cols ary omeeey 


t=1 x(t) 


wer 
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Since, it is not possible to write out the zonal polynomials 
C(Z) explicitly [see Johnson and Kotz (1972), pp. 170-175} 
in the general case, further analysis becomes very difficult. 


Hence, we shall consider important special cases, and 
one of the these will be developed fully in this analysis. In 
these special cases considered, we make use of another 
theorem to be stated below without proof. The term 
corresponding to the zonal polynomial discussed in the 
general case drops out resulting in a simpler prior 
distribution. We will begin by stating the theorem and obtain 
the prior distribution in each of the two special cases to be 
considered. 

To obtain the joint density of the characteristic roots of 
B, we make use of the following result in Anderson (1984), 
p. 318 [Theorem 13.3.1]: | 


Theorem 3.2: Ifthe symmetric matrix B has a density of the 
form 9(61,52; +,0p), where 6;>062> -. >Op are the 


characteristic roots of B, then the jotnt distribution of the roots 
1S i La 


Kg (5y, 59, ---. >65,)| | (6, -5;). 


I<J 


Case 1. The first case assumes that the eigenvalues of y'y-1 


in the metric of X,, are equal, that is, 


Ay = Ay =e = Ag =A. 


In the practice, there will be cases where these eigenvalues 
will be approximately equal and this assumption will be 
relevant for such cases. Consider the matrix B which was 


defined earlier as a!/2 7 q!/2. In general, the matrix B given 


the data X has an inverted Wishart distribution with 
parameters v, K and A as seen before. Under this 


assumption, B given the data X is 747 which is inverted 


Wishart with parameters 7 I, K and v. The density of Bis 
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written as 


-(v+K41)/2 ee ~ 
fiB)<|B PON” exp- (4/2) tr Bo 


and further simplification yields 
- K+1)/2 > 2 
fBeT] (ayer? exp- (4 /207,). 
i=1 : 


We apply Theorem 3.2. to the symmetric matrix B; the 
induced prior distribution of the ordered roots is 


"(07 s 2 2 \-(v+K+1)/2 
P'(F 9, 1% aq) veey Aries oc (o%,) (v+K+1)/ 
i=1 


exp- (4 /202,)] [ (02, -02,). 
i<j | 
Let G be the matrix that consists of normalized 
characteristic vectors (where G'G=GG'=1I) of the matrix 
B. Applying the result in Theorem 13.3.4 of Anderson (1984), 
p. 322 to the matrix B, we see that the matrix Gis distributed 
independently of the characteristic roots ce for allt. Gis 


distributed according to a conditional Haar invariant 
distribution. [See Kshirsagar (1972), p. 441]. These results 
will be used later. | 


Case 2. Another special case arises when some of the 
eigenvalues might be near zero and the others 
might all be equal. Such cases are considered by 
the following assumption: 


and 
Ams = Am+2 Sandy =O, 


Let H,, denote the m-dimensional submatrix of H that 
corresponds to H. Then, by a result of Zellner (1971), 
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p. 395, H,, has the inverted Wishart distribution with 
parameters Alm and (u-K+m). 


as above, the induced prior density of the characteristic 
roots of H_ is 


m 
OC O 5s oie 202 ) x] [(e? an (v+K+1)/2 exp-— (A / 20%, ) 
i=] 


al (0%, ae, 


i<j 


We have considered important special cases and derived 


iat , 2 2 2m ah 
the joint prior on (03,,03, oz, ) in each of these cases. 


We will consider case (1) and analyse it further. The joint 


prior of (07,02,, ..., 02) is 
: 2 
t 2 2 2 2 | 2-1 2 \ Kel 
pio TQ, OG, ) K (T°) / (o7,) EN, 


t=] 
exp- (4 / 202 ) 
xP] (02, -02,) exp-1/2 [Cy / 0}, 


I<j 


given A, C,, v, Kand Z. as the prior assumed on o? and the 


priors induced on oe for all are distributed independently 
of each other. The likelihood function of the sample asa 
function of aand o? is 


L(a,0*) « (o*)"? exp -1/207|[(y - Za)! (y - Za], 


Since, a prior is assumed on a, from the Bayesian point of 
view, the likelihood function can be viewed as a function of 


(o? tO ees ,04,) and is derived as follows: the distribution 


of y is obtained by incorporating the prior on a. The first 
two moments of y are: 


By the same reasoning _ 


a eae! sone eens! 
enn aR re 4 
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1, BY|XEp)=0 ne 
2. CovlyX, Yq) = B(Zaa'Z|X, Eq) + 
Elee'|X,X,) + 2E(a'Z'eX, A). 
Since 
E{a’Z'e|X, 23] = BIE (a'Z'e X, Xp ,)] 


= Ela'Z Ble|X, Lg ,a] 20): 


we have 1 = Cov(ylX, Xu s)= o7 [I+ ZTZ'|, where _ 
T=Diag(o? fo7,04, [a7 ins, fo )- 


Hence, the distribution of y “ N[O, >]. 6 
The likelihood function is written as © 


Lo", 02, .:.02,) © |S] exp-1/ aly’ Zyl] 
given X, 2... By Bayes’ theorem, the posterior distribution 
of Cree my, is | 


a ee 2 ee 2 2 2 2 
p'(o Oa, 00s oa, )% Lo Oy fig: /P le Oa, ree ), 


and hence the posterior distribution of ica, “Sa. is 
Pe 2 a J 7 : 
pio on Og ND / (o2)-4/2-1 
K eat! 
2 )-(v+K+1)/2 
[] are" T] oz, -2%,) 
= 


= i<j 


K 
xexp-1/2[Cy/or+y'D y+ > Afar], (33) 
i=] 


PN RR NENT ren Tg 
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given Y, X and the transformation matrix A. Further 
simplification of this posterior density is done by substituting 
simpler expressions for >| and the quadratic form in y. 


[See Appendix A for details of this simplification; proofs are 
provided for results that are being used]. The simplified 
posterior density is written as 


K 
”" 2 2 2 Oy es Ss 
P'(O" OG, r++ FQ,) & (5%) Peaye TT (l+o2 / oc”) 1/2 
i=] | 


2 \-(v+K+1)/2 2 2 2 
(o3, PORN? xT] of, -o% exp -1/20 


1<j 
K ~ 
[Qg+ >, 072/02 -H(q TI, 
i=] 
where q = Z'y, Op = y'y+Cpand H(qg,T)=q'(I1+T')'g. 


Consider the transformation to the (oa veuttx) space, 
where | 
n = 02 /(o2 +07) for alli, 
o* =0*. 


_The inverse transformation is given by 


o2 =07n, {(l—n;) for all i. 


a; 


oa” =o". 


Lemma 3.1: The Jacobian of the transformation from the 
2 
(ee 2) space to the (O° Mss) 


space (where 7); is as defined above) is given by 
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Proof: | | 
The Jacobian of the transformation is 


A(o?, 02 ,.. oi. 
Pia = |Q\, 
O(o rT] y+ = 
where the matrix Q is given by 
(1 Oc: & Rees ne) 
Oe" /(l=1,) 0 = Oo 
eal 22 
O Oe eae o? /(l-nx)? 
Thus, 
ae) =. 
Ul ecae aeaeecaa 
[[o-n 
i=1 


The posterior distribution of (ol pene) is 


= K . 
DS Myaane) oc Cas G | (1 - le ne? py eRe 


t=] 
ome. ed 
x] | fla. 7)) exp -1/207[Q9 + SA - ni) J; - ni) 
i<j i=] 
where 7 


a(PRAtUaK = 10 S37 icny <1 and 
Fli+13) = (mi - 05) /(L-1;)(1- 7)). 


The posterior distribution of (77),772,----4x) is obtained by 
integrating with respect to 42: 


: 4 
| 
q 
a 
" 
‘ 
A 
? 
| 
| 
i 
| 
| 
{ 
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) 
| 
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K 
TI @ jee i ain | f(nisn,) 


i=] i<j 
[Olm, MQ yeeey nx)" 


2" (Ms Mar IK) & 


K K 
7 2 
where QlI1>729-+»7K) =[Qo + > Ai i mi)/m >, Gi ni). Now, we 
i=l i=] 


simplify the term LI (717j) as follows: 
. isl i<j 


PLT] nd [Ll non Jose ny 


=1 i<j j=it+l 


and 


Ke K K 
[ | fm.0)) = [i /0- al] [0/0 - 90-9; Jnl. 
t=] t=1 


jJ=t+l 


This is further written as <P] fmn)=] [in 0- ni)'| 
t=] : i=] 


K 
] [@-2)/m) for K > 2. 


j=itl 


Substituting for this in the posterior, we write the posterior — 


density of (7,7,....7x%) as 


ae K 
yi-(w+K +1)/2 , 
I] & (7; = mi) LT] d= 2j / a) 
P'(N Mas ey NR) joist 
[Ql "2, re) nx)\*/? 
ifally, e (0, 1], 
= 0, 


ifany 7; = 0, 


(3.4), 


ere eee eee 
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where n=(K+v)/2-(i+l)and the posterior density is 
written given the data X, y and the transformation matrix 
A. : | 


3.1.1: Comparison with the One Way Analysis of 
Variance Model 


The Bayesian inference and the derivation of the posterior 
distribution here have similarities and differences with the 
Bayesian inference in the one way components model due 
to Hill (1977). We attempt to present briefly the one way 
variance components model and contrast the posterior 


density derived in (3.4) with the posterior density derived | 


by Hill (1967). | 
1. The one way model of variance components is written as 


Uy + at ey 
where V=1, 2, a 4 Lj=1, 2, seey Ji; N =») Jj. In 


addition, €; ~ NV [0,07] and a; ~ N[O,o2]. In each class 
i, there are J. measurements. If J, = J, the model is 
said to be a balanced one way model. Otherwise, it is 
said to be an unbalanced model. This model is 
comparable to the transformed linear model specified 
in (3.2). . 

The model in (3.2) can be written as 

Ui =Zzla+é, my 


where t= 1, 2, ..., T. Except for the constant term and 
the fact that in each class i, there are no multiple 
measurements, this transformed model is similar to 
the one way variance components model. The role 
played by the JU observations in each class iin the one 
way variance components model is played by the data 


matrix on the K explanatory variables in our model 
(3.2) in the regression context. 


tO RAT 


ota assuage nears 
sr chet pn al Tt Sie REN IS EPI OE PIN INES I ES, 
+ a nN lr Nein er: Hy LAL BEN % 
af ES TORE AE MOORE ONE 
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2. In the case of the balanced one way variance 


components model, the posterior density of a7 me eo 


derived by Hill (1967) [see p. 817]. In the context of 
other related models, a full Bayesian analysis is 
presented in Hill (1965), Hill (1967), Hill (1980). The 
posterior derived in (3.4) resembles the posterior 
derived by Hill (1967) if we consider a model with only 
one explanatory variable (i.e., when K = 1). An 

2 2x 


— 


important special case is when o2 = Oy Hue ae. 
then the posterior density in (3.4) reduces to 


(7) 40/2- lq rte +Ky/2- 1 


1 e( 
(a s- n) Cy /n-(q'q) ara 
EO HP Gly yh e aM y= 0. 


p" (1) x O, 1), 


Hill (1967) derived the posterior density of 1? = 02 /a? 
ie in the context of the one way analysis of variance 
model. For the sake of comparison with the posterior 
density derived there, we obtain the posterior density 


of -2 from the posterior density of 7 obtained above 


as: 
tk rete 
v TS ae ae Oe Ee Re ee ; 
[(Q5 +C, (1+ r7) {| c? + (Qo Saye lo, A+Ad,) 
tr? >0. | FEES 


The posterior sensing of ,? derived by Hill (1967), p. 
817 is 


g(r?) CC aa 


(22)%. PM y,2 2 2y(N+ay -1)/2 
[(SSW + SSB) + C, (1+ Jr? \/r? +(SSW)Jr 2 NPN A=)” 


where ;? > 9 and SSW, SSB are within sum of squares 
and between sum of squares for the one way analysis 


tr a ents esteem seinen mars —e 


_~ 2 eee 
eo 
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of variance model. In comparing the two posterior 
densities, the following points need to be noted: 


1. There are no multiple measurements within each class, 
Jin the case of the regression model considered, 


Hence, the term (1+ Jr”) is modified to (1472) in the 


posterior density p”. 


2. Hill (1967) obtains the posterior density g"(:) assuming 
a diffuse prior on g2. This alters the form of the 
posterior density of g"(). | 


Except for these points of distinction, the two 
posterior densities are identical. To that extent, the 


posterior density of (7,,72,...,.7x) in (3.4) can be seen 
to be a generalization of the posterior derived in Hill 


(1967) (p. 817) to a case where there are K variance 
components. 


3.2: Evaluation of the Posterior Mean of 77; 


The posterior density of (7,72,....97«) has been obtained 


above. The posterior mean of 7; for ican be directly obtained 


from the posterior density by integration. The direct, exact 
expectation integrals are hard to evaluate. Numerical 
integration could yield these expectations directly using 
numerical integration techniques. We propose an approach 
that will involve expanding the denominator of the posterior 
density using a Taylor series expansion. In turn, this 


expansion will enable us to write the posterior mean of 7; as . 
a ratio of an infinite sum of hypergeometric functions. This 
will enable us to utilize results developed in Hill (1977). Also, 


in some cases, these functions are simple and could be 
evaluated directly. 


Thus, the posterior mean of 7; for all iis obtained using 


a Taylor series expansion. This is done as follows in several 
steps. Firstly, the denominator of the posterior density 
derived is mcdified and studied. Next, we expand the 


4 
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denominator using a Taylor series expansion and 
multinomial expansion of some terms. A modified posterior 
density of the shrinkage factors is obtained. This in turn 
enables us to obtain the posterior mean of the shrinkage 
factors. The posterior density is rewritten as | 


K 
[ [@) —(v+K +1) a Nn; ‘a 


i=l 


PM Ma» IK) & K 
rat [] (hes nj; /n)N Teck {Ol 725+ »1K)| ua 


j=i+l 


The denominator of the posterior density written above is 
expressed as: 


K _ 
Q(m,725--M«) | [0 —1j [ny 7!* = (Q\(1)Q2(72)-+- Ox (7K) 
jeitl 
M11 5729-9 7K)s 
where 


mM K) = AN Ma» aastl [a= To mi) 


j=i+l 
O.(7;) = {(1 — 29-17; )(1 -— Za; L- 7;))}, O < Zoi Zai <1, ne all i. 


This enables us to rewrite the posterior density of 
(11 Na v-x )aS 
P'( 229 aocne Malas MK) mo 
i=l . ; 


where 


7 eee Ni i 


[oem 
1=1 


#:(7;) = 
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We will analyse and focus on the function [m( 7, Nas+sNx)| 
in the next subsection. | 
3.2.1: Study of m (i172 ++" nk) 
ds O(n ttaestx) comprises the numerator of 
Mis Mas IK): Also, it is a non negative function of 
(7113 7as0 IK) + Thus, Ol Mas 7K) >Oforall 17,75, .. 
nx €[0,1]. 


2. Qs Mas-MK) IS an analytic function in [0, 1] and in 


‘9 


particular is a continuous function of (7, 79,...,7%)in 
{0, 1] and hence is bounded. Thus, there exists a bound 


Q, such that 0<Q (Mos) S$ Qo for 7; €[0, l, for 


all 1. 
3. Similarly, the first term of the denominator of 


Mth, No.--3MK) Consists of a product of functions, 
O(n). Q(m)is an analytic function for all 1 and is 
continuous in 7, for 7; €[0,1]. Also Q;(7;)1s positive 
for n, €[0,1]. Thus Q,(7%) is a bounded function and 
there exists 0. >O such that Q: (7) >Q.. for all 
n, €{0, 1]. This is because Q;(7;) 1s a concave function 


of 7; and its real roots lie outside the interval (0, 1| 
and hence it is positive in [0, 1]. 


K _ 
2/4 eae 
Also, the other term is | [tc ~nj ln) '” , which is also 
yee IHD 7 
between zero and one i Pa rr each of the 


: ; is 
7, 18 between zero and one. Hence; the product 


bounded and there exists 7° such that 


K 


[ [la-n, [n)Pl* > a ei 


jziti 
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4. It can be thus seen that m(m,m,....n<) is a non- 


negative, continuous function of (> 72>-- NK) and its 
bounds are given by 


K 
0 < mm, 72+») S Qo/ | | ion = mo, for all n; €[0, 1]. 
i=l 
3.2.2: Expansion of m(71,72,---)1x) 


In this section, the function m(7,,7,...,.7¢) will be 
expanded using a Taylor series expansion. This will enable 
us to rewrite the posterior density of (7,,72,.--,7%). Hence, 


we will complete the posterior mean of 7,. 


The function m(7,7,...,7«)is defined as before as equal 
to 


Q(7,72)--- ned /[] 0-14 Fa? [ Posen 


J=t+1 


Consider the function 
=A. / 2 _7 
M7172 9-+9IK) a a [M(7;,7725---x)/ Mo] ore 


Define — f(%,M2)---.K)=—-A /2In[m(n,,79,.--.M¢)/Mo]. The 
logarithmic expansion 


Inf - x)=-x-=—-7=-- excl 
will be applied to Ce ane) ve 


In| ee MK) 
0 


Sn (M75 725---9K)/Mo)} /i 


where ee eee eT O<x<1. Note that 


where x = 1, that is when M(,72>---»7K)=0, the expansion 


meee pn wee AS ORILE 
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s not valid where 77; = 0 for all i, and this set of points is 
1 

ignored in the analysis. [See Appendix B.] Thus, we write 


(lth s tas» NK)/ Mo) es exp(A 2) (mn M25: 51k) 
i=] 


/mo)I /i; 
where 0 < m(7,,7)-- Nx) <Mo,A =(T+A+v-K-}). 


We rewrite fii Hel |*/? as follows 


mM MarIK) | Mol 4? = Yay jy 


k=0 


M7, 2++s IK) / Mo)" fil" 


using the expansion 


| 2 
| expx=1+x+7-+...forall xe R, 


The left hand side (LHS) can be further simplified using the 


multinomial caneneen for (Sx,)" " and is expressed as 
i=] 


D 


: | Sil; 
LHS = Sr /2) TD io ea LL 


= Thal 


= 1=1 


where ~~ =k. [See penn C fora ae that the 


1=] 
multinomial expansion for the’ finite sum can be extended 


to the case of the infinite sum.| 


— oie ermtatieeetintsheeninnancany 
a > again 
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Using a further multinomiai expansion and on 
simplification [See appendix D for details], the function 


ms N20-57K) is written as 


_ H k i 
[MON Mav) A? & > » = > ae 


gh(a / 2) I-19?" 2" i=l 
ee ee a ey ee Ne eee em ae 
Aldo Wm) (Qo) | [lad [LAr tay? | | Ga !ai29 
1=1 i=] ‘ i=] 
(3.5) 
3.2.3: Posterior Density of (71,1 2:-+-.7K) 


Using the expansion for [MM sMos--oIx Pe! in (3.5), we 


are able to write the posterior density of (7,,79,...,.7%) as 


follows: 


K es 
P52; eey 1K) «| ] O:(7;)[M(, 172; teey nk) 4/2 if all Wy € (0, lj, 


1=] 
= 0, if any n; = Q, 
where 
1-(v+K +1)/2 yn 
UE 1-7: 
Gli) i ei 


| [Qa /? 
is] | 
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Thus, 


ee tid aif [nh- my 
2" Ms Marx) © >, 2: > > > 


as N=0 {7:3=0 d= = intA/2 
k=0 {L}=0 {j}=0 dj=0 h=0 | [iownat" / 


if all Ni e (0, 1), 
=.0, if any i = 0, 


where 
2() = 2(kl, A, j,%;,(dy),(p;), (A), Qo» Gn) 


K . — iam e — 
[ [ean 4 /2)*2! (29) (Qo) ai? (A)P2 
_ t=l 
a 
(a Can !do | [da Ido !) 
i=] i=] 
and h, = d;» +(i-2)h -dy +1-(v+K+1)/2. Thus, 


K 
i= Ee MPM Mar-»IK\YX)| | dni 
E[n;|y,X] = ——— 5. — 


J elas p'(mMarIK|Y»X)] | dni 
i=l 


We define the following integral which is a generalization 
of the hypergeometric function studied by Appell’: 
! 1 
FilA,B,C3294-1,20i] = | ine: te Gi, 
0 [(1 — Za;17;)(1 - Za; (1 - 7;))] 
for all i= 1, 2, ..., K, and where z,,_, €[0, 1]. In terms of 


f(), we can obtain Eln;ly, X] as follows: 


€ pate: 
a» wt 
fy 


- A fw ww 
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ZTMs 
M> 
M>~ 


V20A IA +1 BG 2a 22] 
=0 


=0 {L}=0 (,}=0 dj=0 h 
Ejn|y, X)= er ae = K 
k=O {h,}=0 (=O dj=0 =O ied 
x] [F14:- Bj, G Zoj-112;] (3.6) 
jai 
where: 


A,= h, for all i= 1, 2, ..., K, 
B.= (n+ d,,) for alli= 1, 2,, ..., A, 
‘y 


Cae] 2) 

This can be seen to be a generalization of the posterior 
expectation obtained in Hill (1977), p. 133. For the balanced 
one way analysis of variance model, Hill obtains the posterior 
expectation of a parameter @ which is comparable to 71n 
our model as a ratio of hypergeometric functions. In our 
model, when there is only one explanatory variable, the 
posterior expectation of 7 would be a ratio of hypergeometric 
functions. 


Hill’s (1977)® results enable us to obtain exact and 
approximate solutions to the functions of the form 


f{A;,B;,C; 2;-1,22;] which will be discussed in Appendix 
E. We can state the following result: 


Theorem 3.3: 


1. Consider the reparametrized linear regression model (3.2) 
with normal disturbances and a normal prior on a (the 
vector of coefficients). In addition, we assume the case of 


only one eigenvalue 3 (of x'x in the metric of £,, ). [See 
case (1).] Let a;(A) be the Bayes estimator of a; for allt. 
Then a;(A) = E[n|y,X] &;(0) for all i = 1, 2, ..., K, where 
E[ni|y,X] is given by (3.6) and the conditional Haar 
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invanant distribution of the matrix G is given. a;(A) is 


admissible for a squared error loss function. 
9. Consider the model specified in (3.1). Let B(A) be the Bayes 


estimator of £. Then, BA) = [A'!7A']B(0), given the 
conditional Haar invariant distribution of G, and where 


n = Diag{E(,|y, X),---El7x|y, X)} 


BA) is admissible for PB. 
Furthermore, 


B(A) = (AU) nA U)IA(0), 
where A'(U) is the expectation of 4' taken with respect to 
the conditional Haar invariant distribution of G. & 
Proof: 


1. Consider the model in (3.2). The Bayes estimator has 
been derived for a squared error loss function for this 
model in Chapter 2. The Bayes estimator is 


a; (A) = Elni\y; X] a, (0) for alliwhere 7; has been defined 
in Section 3.1. Both 7; and a@;(0)depend on the 
transformation matrix discussed earlier. We are able to 


factor out @;(0) out of the posterior expectation evaluation 


because the distribution of oe and the distribution of 


the matrix of its eigenvectors (normalized) G are 
independently distributed. The matrix of eigenvectors G 
is distributed with a conditional Haar invariant 
distribution described in greater detail in Anderson 
(1984), p. 321-322 and Kshirsagar (1972), p. 441. Hence, 


the transformation matrix ASC. G given X is 


distributed independently of oe for all iand hence of 77; 


for all 1. Elnily, X] is obtained in (3.6), enabling us to 


pede 


oer eR te Re Sse Sse | OM O 
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compute @,(A)for all i given the conditional Haar 
invariant distribution of G. Further, the Bayes estimator 


of @,(A) for all iis admissible for all i by Theorem 2.3, 
Chapter 2. 


2. For the model specified in (3.1), the Bayes estimator of Bp 
is given by f(A) = (A')'a(A), since B= (Ay! a. Given y, 
Xand Gand that @(A) is Bayes for a, it follows that BIA) 


is Bayes for £. We write (A) = A’ nA'}p(0) given the 
conditional Haar invariant distribution of G, where 


n = DiaglE(m|y, X),...,Elax|y,X)- 
By the application of Theorem 2.3, Chapter 2 to the model 
3.1, it follows that B(A) is admissible for #. Hence, 


B(A) = {AU} nA'U)IB(0), 


where A'(U)is the expectation of 4’ taken with respect 
to the conditional Haar invariant distribution of G. m 


3.2.4: Two Regressor Examples with High Degree of 
Multicollinearity 


To illustrate the techniques developed in the context of 
the general linear model, we consider a model with two 
independent variables. In the example, a specific realization 
of the independent variables is considered, as in general they 
are assumed to be stochastic. Hence, the elements of x'x 
are viewed as specific values taken by these random 
variables. The example considers a case in which two 
variables are highly collinear. Economic data often exhibit 
multicollinearity, and in this section we hope to demonstrate 
that in such special cases, the computation of the posterior 
mean of the shrinkage factors is simpler by making use of 

this high collinearity. 


We consider a subset of the data used by Klein (1950)? in 
his first econometric model on the U.S. economy for the 


a eal 
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21-1929. The data will consist of three national 
aggregates viz., aggregate consumption, aggregate income 
from profit, and ageregate income from wages. [See Vinod 
and Ullah (1981)"°, p. 10.] The model that is being considered 


years 19 


is 
Y=At+X Py +XoBoteé 


where: 
: Tx 1 vector of aggregate consumption. 


T x 1 vector of aggregate income from profits. 

Tx 1 vector of aggregate income from wages. 

T x 1 vector of disturbances. 

Constant term in the model. 

Marginal propensity to consume from profit income. 
Marginal propensity to consume from wage income. 


a DR mH KR RK | 


3.3 Description of Data 


We provide a summary of the data as follows. The data 
on y, X,, x, are considered for the years 1921-1929. The 
sums of squares and sums of cross products are given by 


Y'Y= 24346.95, x{x, =32506, x4x_. =133310, x{x.= 
65759, xiy = 8880.25, x,y = 16466.07. The eigenvalues of 
xx are 16576, 5.53. Using the transformation matrix A, 


we compute Z. Having done so, we obtain 


Q, = 2jy = 126.73, qo = Zgy = 95.38. 


We assume an inverted Wishart prior on 2, with parameters 

P, 4 and 2. In addition, we will assume that P differs from | 
X'X in a negligible way so that the eigenvalues of x'x in | 
the metric of 2,, are approximately all one. 


This section focuses on thé marginal propensities to 
consume £,and £,and hence, the parameter @ and its 
estimation is not discussed. The elements of the covariance 
matrix of the prior on £ are assumed to be 
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o%, = 2,0%, = 1, and cov (f,,8,) = 0. 


(We are assuming normal priors on both /, and £, , as these 


parameters of the model can possibly take on negative values 
as well as positive values that are greater than one. Although, 
theoretically one would expect the marginal propensity to 
consume to be between zero and one, when we split income 
into wage income and profit income, the mpcs need not be 
between zero and one. Thus, the assumption of normal priors 
in this case is not unreasonable). Solving for the roots of 


the detrimental equation described earlier , we get On ; Ce. 
as: o. = 20197.44, 0%, = 11.19. The choice of the prior 


parameters is: 1 =4,C) =4 and yp = 4, The prior parameters 


that need to be chosen in this example are P,A and v. Their 
choice is motivated by the fact that these parameters 


determine the moments of the prior distributions of 5? and 


»,- In particular, 
Elo*| 4,Co] = Co / (A - 2) 
and 


E[X |Z, V, K] =X, /(v- K 1). 


In this case, the posterior density of (7,, 72) is 


2 
(7) OR gy EN? TT = ni)" 2 fm) 


e"(9\ 42) & ——— >=. 
[Q(771; 270 yr? 
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where n= (K +0) /2-2. Thus we can obtain, 
oo k le jg =” 
a aap aes z 2) fils +1,Bj, Ci 294-1204) 
k=0 {l}=O {j= Od ;=0h =0 
Ein; lu, X]= ee ae ee 


xT | filAi, By, Z2i-1,29;] 


jet 


where: | 
A,= h, for all i= 1, 2, 
= (n+ d,) for alli= 1, 2, 
=(Jg + 4/2). 

This follows from the application of the result in the 
general K dimensional case to the present case of two 
variables. Approximate results are obtained in this case by 
making use of the collinearity between the two variables 
and obtaining an approximate posterior density. In 
particular, the denominator of the posterior density is 


Ql"n,72) = [Qo - Cd =aiyneaal 


ti 


which is approximately written as 
O(71,72) * (1- 0.6677;)(1 - 0.1379). 


Since, 4=1, the terms involving 4 drop out of the 
denominator and hence we are able to write Q in the way 
we did above. The posterior density of (71,72) is 
approximately written as 


l-(v+K+ v+K+ 
(mY PPR (4 KUT Ta ni)" (112 /m) 


pP'"(7 M2) & —— as 
[(1 - 0.667, )(1 - 0.137,)]"/2 
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This enables us to compute 
E(n|y.X) = 


and 


_ fl(@-+K+1)/2),n,A/ 2; 0, 013) 
Elina, X) = Fa wa K 41) /2,n,4/2; 0, 013] 


On using numerical integration methods and making certain 
approximations, we obtain the following values: 


E[m|y, X] = 0.43, Elnaly, X] = 0.06. 


We now compare and present the OLS estimates as well 
as the Bayes estimates: | 


3.3.1: OLS Estimates 


The OLS solution yields ,(0) = —017, B2(0) =112. The 


OLS estimates of £, and f, imply that the mpc from profit 
income is —0.17 and from wage income is 1.12. Thus, 
whereas consumption declines by 17 cents for an increase 
in a dollar of profit income, it increases by 1 dollar and 
twelve cents for a dollar increase in wage income. The 


confidence intervals for B, (0) and Bo (0) are reported below: 


(a) The 95 percent confidence interval for B,(0) based on 
a two sided t-test is (-1.03, 0.69). 


(b) The 95 percent confidence interval for B (0) based on 


a two sided t-test is (0.69, 1.55). The estimate of Bo 


Seems to be unacceptable as the wage earner will 
become bankrupt with such a high mpc. 


3.3.2: Bayes Estimates 


The Bayes estimates are given by 


BIA) = DB(O) + [I - Dis" 
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where | 
Da Av HA! 
and f” is the vector of prior means. An approximation made 


here is that we will be using the sample based value of the 
transformation matrix A rather than its expected value with 
respect to its distribution. Substituting the values of the 


posterior means of #, and f/f, , we getas the Bayes estimates, 


By(A) = (0.1773)(-0.17) + (0.2339)(1.12) + (0.8227)(0.2) - 
(0.1637)(0.70) yielding 4:(A) = 0.2817, and 


BoA) = (0.1673)(-0.17) + (0.2721)(1.12) — (0.1637)(0.20) + 


(0.7279)(0.70) = 0.7531 yielding 42(A) = 0.7531, where the 
prior mean of £, is assumed to be 0.70 and the prior mean 
of £, is assumed to be 0.20. 


Plots of L(7,,75) and p"(7,,72) are shown in figures 3.1 
and 3.2. These functions are explicitly written as follows: 


[[o-nv2 


i=] 
T/2-1°’ 
/ 


ENG ys l)gi) ee ee ae 
[y'¥ - Gi - 92" 


2 
(ORE gy PTT 1 i)" - 2 1m) 
i=] 


[Q(m 72)? 


where n= (K + u)/2-2. The contrast between the two displays 
the modification in the likelihood function when interacted 
with the prior distribution. The posterior thus summarizes 
and includes the learning that occurs after the data are 


used. Whereas the mode of the likelihood occurs at 7, = 1, | 


p(s 12) © 


N2 = 1, which is the OLS solution, the mode of the posterior 


distribution occurs where 7, ~ 1 and 7 ~ 0. 
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Hig. 3.1: Plot of the likelihood function L(7, 79) 
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Fig. 3.2: Plot of the Posterior density p(n, 72) 
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J. Berger (1982), “Bayesian Robustness and the Stein effect”, 


JASA 77, June 1982, 
The transformation matrix A is unique if there are no multiple 
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Lm 
roots o for the characteristic equation being considered. 


The analysis presumes that these roots are distinct; the case 
where the roots are not distinct is a special case of the more 
general case being considered here. 

3. For a proof of this proposition, see T. W. Anderson (1984), An 
Introduction to Multivanate Statistical Analysis, second edition, 
John Wiley, New York, 1984; appendix I, p. 589. 


4. It is to be hoped that by assuming a prior on the unknown 
variance oF , the degree and the extent of the misspecification 
are decreased. 

5. Consider the matrix 

B = A'/2zy,1/2. 
the eigenvalues of B are obtained as follows. Consider the 
equation satisfied by a2: 
ryy-l 

JZ, -oa(X'xX)' |=0 

This is modified and written as 
ICXp C- ofA" |=0 
where C is the matrix defined earlier. This shows that the 


eigenvalues of ,}/2;,1/2 are given by Zp. 


6, E.E,. Leamer (1978), Specification Searches, John Wiley, 1978, 
Appendix II, p. 336; the normality of y follows from the theo- 
rem on p. 336 which states that a linear combination of nor- 
mal random variables is also normal. 
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A Multivariate Transformation and 
Bayesian Inference in the Linear 
Regression Model 


4.1: Introduction 


A multivariate transformation is proposed in this paper 
that is seen to be important for a Bayesian analysis of the 
linear regression model. A new, operational version of the 
Bayes estimator for this model is proposed in this paper. 
The standard linear regression model is assumed with a. 
normally distributed error vector anda multivariate Normal 
prior on the unknown parameter. It is well known that the 
Bayes estimator’ in the context of the regression model with 
squared error loss is admissible for proper priors [see Leamer 
(1978)]. However, this estimator cannot be used in applied 
work because it depends on unknown parameters of the 
model such as the error covariance, the covariance of the 
prior distribution as well as the prior mean. 


An important approach in the recent literature has been 
to use the Bayes estimator after replacing the unknown 
parameters in its formula by their sample based estimates. 
These are called ‘Empirical Bayes’ estimators (some of the 
Stein type estimators are seen to correspond to these}. 
Several such estimators have become increasingly important 
as alternatives to the OLS estimator because of their superior 
mean square error properties |see Leamer (1978), James 
and Stein (1961)]. However, such estimators cease to have 
the attractive theoretical properties of the Bayes estimator. 


The paper is concerned with the derivation of a new, 
Operational version of the Bayes estimator that retains the 
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theoretically attractive properties of the Bayes estimator as 
far as possible. It will be seen that the unknown covariances 
in the model enter the formula of the Bayes estimator 
through what we call a ‘shrinkage factor’. It is this unknown 
shrinkage factor that prevents the Bayes estimator from 
being used in applied work. The approach here is to assume 
priors on the unknown covariances in the model and employ 
a multivariate transformation that reduces the general linear 
regression model to a simpler model which in turn identifies 
the shrinkage factor in univariate terms. Having done this, 
we show how a new, operational version of the Bayes 
estimator can be obtained although this is not the main 
focus of this paper. This in turn shows the usefulness of 
the transformation in deriving this new estimator. 


The plan of the paper is as follows: Section 4.2 discusses 
the model in detail together with the assumptions made. 
Section 4.3 introduces the transformation, and the matrix 
that induces this transformation together with the properties 
of the matrix. Section 4.4 discusses the role of this 
transformation in deriving a new, operational version of the 
Bayes estimator. Conclusions are drawn in Section 4.5. 


4.2: The Linear Regression Model 


The linear regression model is written in its general form 
as 


y = XPt+e | (4.1) 
where y is the Tx 1 vector of observations on the dependent 


variable 


X is the Tx K matrix of explanatory variable assumed 
to be stochastically distributed, independent of 


(B, o*):X 'X 1S assumed to be nonsingular. 


fis the K x 1 vector of parameters assumed to have 
a multivariate normal Prior with mean “ and a 


covariance matrix B 


a26enee iets 
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e|X is the T x 1 error vector assumed to be normal 


with mean zero and a scalar covariance matrix ¢7/, 
given X. 


The uncertainty associated with the unknown parameter 
vector # is expressed in the form of a multivariate normal 
prior on f. This is consistent with the belief that 7 is 
randomly distributed with the specified multivariate normal 
distribution. The prior assumed on f# is only an 
approximation to the true prior as pointed out by Berger 
(1982), “Usually, only subjective approximations to the true 
prior can be constructed”. A model of this type would 
correspond to a “random effects” model in the classical 
framework [see Scheffe’ (1957), p. 6]. 


The Bayes estimator of fin model (1) is ; 
B=(X'X +.07E4 1) (X'X)A(0) + 07 D5 y] (4.2) 


where B(0) ~ OLS estimator of 7. The Bayes estimator 
depends on the data set Xx, y and the unknown covariances 
g2 and Lz. Since, g? and »%, are not known in general, 
the Bayes estimator of the parameter cannot be computed 
in applied work given the data. The approach here is to 
assume proper priors on o”,d , and pursue a pure Bayesian 
analysis that overcomes this problem as far as possible. 


Section 3 describes the multivariate transformation that 
simplifies the model in (1) as well as the Bayes estimator 
that corresponds to this model. The properties of the 
transformation matrix are also discussed in this section. 


4.3: The Transformation 

The general model (1) can be reduced to simpler models 
using a transformation that simultaneously diagonalises 
(X'X)} and 24. This is more: general than the 


reparametrizing transformation discussed in the principal 
components, ridge regression literature [see Massy (1965), 


ee 
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Oman (1978)]. The trans 
he uncertainty in 2, and on the data X. 


formation involves a matrix that 


will depend on t 
This enables us to transform the parameter space of / to 


pace of w. The transformation is described 


the parameter s 
» oof: 


by the following theorem stated without pr 


Theorem 4.1: 
Consider the model (1) and assume that 4%, and (X‘x) 


“are positive definite. Then, there exists a random, 


nonsingular matrix A (K x K ) for fixed observed X such 
that? 

AUK) Bee I 
where 2, = Diag (07a) 07 a ...0° ay) and o7a, for all i are 
the roots of the determinantal equation. . 


E, -o7a(X'X)"| =0. 


For positive definite £,,0°a, > O for all i. 
The randomness in the matrix A comes from the uncertainty 
in 2, (as embodied by the assumed prior on ~,) as the 


entire analysis is conditional on the dataset X. 
The usefulness and importance of this transformation is 
in simplifying and reparametrizing this model as is seen by 


rewriting model (1) as 


Y= ZLA+E (4.3) 


where Z = X(A’)"' and ZZ =I,a=A'f. 


The model given in (3) depends on the transformation matrix 
A and is conditional on the dataset X. The prior assumed on 


(8,2) induces a prior distribution on (a,£,) which depends 


on this transformation. Model (3) is such that both 77 


ie 
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and 2, are diagonal resulting ina simplification of the Bayes 
estimator making it possible to write it in univariate terms 


This is seen by writing the Bayes estimator of @ for mode] 
(3) as 
&@ = (1+ 072.) '[a(0) + 0?Z, Ay], 
as ZZ =i. 


The t component of ¢ 


a; = nya;(0) + (1- natu (4.4) 
for alli=1, 2,...K 
where 4; = 07a; / (0? +07a,), 
a.(0) = OLS estimator of a;, and 
a,is the ™ column of A. | 
The multivariate prior assumed on the covariance matrix 
= in turn induces a prior distribution on the characteristic 


roots and on the transformation matrix A [the roots (o7a;,) 


and A have been defined in the context of the theorem in 
this section]. The matrix A has the following important 
properties by the application of the theorem in Anderson 
(1984, p. 322) to the present context: 


(1) The distribution of A isa conditional Haar invariant 
distribution as long as the assumed prior density on 


2X, solely depends on its eigenvalues. 
(2) Further, the distribution of A is independent of the 
distribution of the characteristic roots (07a,). 


The second property of A is important for the Bayesian 
analysis proposed in the paper as seen in the next section. 


4.4: The Role of the Transformation 


The Bayes estimator of a in model (3) is seen by equation 
4 to depend on 7, which we will call the shrinkage factor. 
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pasa 


This depends only on the unknown variances o*,o“q@, and 


hence is unknown. This analysis assumes that the prior 
mean “ is known and hence 4, is the only unknown factor 


in the estimator @; which makes this estimator not 
operational. The approach proposed here is to assume proper 


priors on o7,= g and derive the posterior density of 7, [this 


derivation is not done in this paper; see Bhat (1988) for 
details of one such derivation. Hill (1965, 1967) has derived 
the posterior density of the ratio of unknown variance 
components in the context of the one way analysis of variance 


models]. Once the posterior density of 7, has been obtained, 
the shrinkage factor 7; could be estimated by the posterior 
mean of 7;, the Bayes estimator of 7,. Further, the estimator 


a; with 7, estimated by E(7,;\data) is 


&; = E(n;|dataj; (0) + (1 - E(n;|datajja'y 
for all z. 


The vector estimator @ is written as 
a = E(ndata)a(0) + (1 - E(data)|A'u 


where E(mdata) = Diag [E(7,|data),...E(,|data)]. It is 
possible to write this step because A _ is distributed 
independently of (o7a,) and hence 7. 

The estimator @ can be computed given the data and 
the prior parameters only if the matrix A can be computed. 


Now, the matrix A can be computed given the data if 2, 


were known. However, 2, is unknown and hence A is 


unknown and as observed before is random. In the present 
context, we estimate A as follows: 


ES aa THK: 


yosgawee 
Sa igypts SSE 
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The multivariate prior on 2, implies an expected value 
for the matrix £(2,) given prior parameters. The matrix A 
can be observed and computed for the value of 24 equal to 


E(=,;). The matrix A given X and this expected value of = B 
is observed and can be computed. Thus, q@ can be computed. 


Further, the Bayes estimator of {can be recovered as 
B=((A')'E(r{data)A'(0) + 
(A’')"[I - E(ndata)|A'u 


since, a = A'Z from the transformation. This yields a new, 
operational version of the Bayes estimator that can be now 
computed given the data and the prior parameters. The role 
of the transformation has been in providing the crucial 
intermediate step which enables us to express the 
transformed parameter vector a@ and the associated 
shrinkage factor in univariate terms. This is turn makes it 
possible to estimate the shrinkage factor and further obtain 


an operational version of the estimator q@ in turn yielding a 


new version of the estimator B. 


4.5: Conclusion 


The paper considers a multivariate transformation that 
is shown to be a critical intermediate step in obtaining a 
new, operational version of the Bayes estimator. The 
transformation reduces the original model to a simpler model 
with a diagonal structure in turn making it possible to write 
the Bayes estimator of the transformed parameter in 
univariate terms as a linear function of the unknown 
shrinkage factor. Estimating the unknown shrinkage factor 
by its Bayes estimator, a new operational version of the 
Bayes estimator is obtained. The role played by this 
multivariate transformation in obtaining this estimator of 
8 is the focus of this paper. The Bayes estimator is an 
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approximation of the true Bayes estimator to the extent that 
we have estimated A from the prior parameters, the data 


and E(2,). 


Notes 


1. Itis also known that the Bayes estimator is the posterior mean 
of the posterior distribution of the unknown parameter. 

2. Theorem 1 is taken from Anderson (1984), Appendix I and is 
applied to the present context. 

3. The transformation matrix A is unique if there are no multiple 


roots g2q for the characteristic equation being considered 
and if the characteristic roots are ordered so that 


oa, >> cag > 0. 
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Extensions to the Linear Re . 
; re 
Model with Nonspherical] Siteke 


The linear model discu ssed in Chapter 3 was exactly and 
exhaustively specified. The implicit assumption was that 
the researcher could specify a unique, small but exhatisti 
set of independent variables which constituted an abctipate 
specification of the model. A strong assumption generally 
made along with this is that the error term consisting of 
random influences is spherically distributed. In this chapter, 
cases where the error term is nonspherically distributed 
are considered. Two specific cases with such nonspherically 
distributed error terms are considered. 


The first model that will be considered is the linear model 
with correlated disturbances. Autocorrelated disturbances 
could emerge due to the problem of omitted variables as 
argued by Johnston (1972), “if the serial correlation in the 
omitted variables tends to move in phase.” Autocorrelated 
disturbances imply that when observations are made over 
time, the impact of the disturbance occurring at one period 
carries over into another time period. The feature of serial 
correlation in the disturbance terms is more commonly 
encountered while working with time series data. The 
anology of the autocorrelated disturbances with the sound 
effect of tapping a musical string 1s well expressed by se 
(1971): “While the sound is loudest at the time of Baa : 
does not stop immediately but lingers on fora time ws 2 
finally dies off. This may also be the characterisuic ol sae 
disturbance, since its effect may linger for some time al 
its occurrence. But while the effect of one disturbance ee 
On, other disturbances take place, as if the musica ae 
were tapped over and over, sometimes harder ene in 
times.”? As the sound of the musical string can be 
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the next period, if the time periods are short, so also there 
is a greater likelihood of encountering serially correlated 
disturbances while dealing with monthly or quarterly data 
rather than annual data. In such a case, the error term is 
no longer spherically distributed. 


The case of the measurement errors in data is the second 
case considered here. In such a case, the error term 
incorporates the errors in measurement of the independent 
variables making it in general nonspherical. Such errors 
are quite likely in economic data, as they are usually taken 
from governmental or private publications which often report 
only the reliable or significant digits. 


This chapter extends the full Bayesian analysis presented 
in the last chapter to each of the two models considered 
that are more general in nature. A family of Bayes estimators 
will be derived in these instances. 


5.1: The Linear Regression Model with Autocorrelated 
Disturbances 


The linear model is considered with serial correlation of 
the first order assumed for the error term. The model is 
written as 

y=XBr+eE | (5.1) 


where: 

X : Tx K matrix of explanatory variables, all of which are 
assumed to be stochastic, distributed independently 
of (f,07); the matrix x’x is generally assumed to be 
nonsingular. 

Tx 1 vector of observations on the dependent variable. 


iCal 


B : Kx 1 vector of unknown parameters of the model, 
assumed to have a normal prior, N[0,2,]. 
é : T* 1 vector of spherical disturbances such that 


E(e|X) = 0, 


E(ee'|X) =0°V. 


— ae ip Sa 
in Pm pe 


s to the Linear Regression Model . 


extension se 
The matrix Vis a T x T matrix given by 
Lp ae ght 
V=ey ss Cole Tg 
a Fo ie ay 


purther analysis depends on the matrix V and hence we - 


study its representation and an approximation that will be 
used in this analysis. Following Leamer (1978), p. 264, itis 
seen that the matrix Vis represented by 


1 (U- , 
iS eed 2 a a ae 2 


—— 


(l+p) (l-p*) +) 


where 
t Sr 0. 0 
SlanQeel 0 
Az=| 0 +1 2 0) 
O- 0 -—1 1 
and 
10: 0 
00: O 
B= 3 : oa . fe 
0 O°: O 
00. 1 


The matrix B is almost a zero matrix and furthermore, as 
an approximation, the matrix A is considered negligible, s° 
that we can omit both matrices A and B without great error 
and as an approximation write 


a ~ (=p); = pl. 
(1+) 


wifey see cm rhea ans Bebo nC i Ie 
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This approximation would be good in cases where p is small 
ent of the matrix A would be then 


because the coeffici 
relatively insignificant. In cascs where / is large, this 
on would not be very good and a more general 


ould be needed. However, without this 
ysis becomes very complicated and 


approximatl 
analysis W 
approximation, the anal 


mo explicit solution 1s possible unless X, is assumed to be 
proportional to the identity matrix. However, this would 
make the analysis less general. Thus, there is a trade-off 
and in the light of this trade-off, we persist with the 
approximation. We hope that this case considered here will 
give insight into analysing models with autocorrelation. 

We will be considering only the nonexplosive case where 
the autocorrelation co-efficient satisfies 


|p| <1 


As discussed earlier, in the classical framework, this mode} 
is a random effects model and can be reduced to simp.er 
models using a transformation that simultaneously 


diagonalizes. 


(XVI Xy) &1/ p(x XY 


and 1 ,- Applying this transformation, we obtain a simplified 


model that will be studied in detail. 


Consider the model specified in (5.1) and assume that 
both Ls, and x’x are positive definite. There exists a 


nonsingular, random (K x K) matrix A which is the same 
matrix considered in Chapter 3, and which depends on X 


and X,; the randomness in the matrix A comes from the 


uncertainty in 2,,as the entire analysis will be conditioned 


on the data X. This matrix A is such that 
A(X'X)'Asl 
and 


A'L ZA = pier 


Be 


ns to the Linear Regression Model .. 


Extensto as, 
oy = Diag(og On Coo. and o2 . 
where “a aT eee ee a, for all i are the 
roots of the determinantal equation lz y ~ or XX 0. 
: ~M: Slnce 
ty is positive definite, o2 50° for er 
ae 2 f n 


2 
2 T 
? >...>04,. The transformation , 
a2 > Fa, ar rmation matrix A depends 


on XX and £,. This simultaneous diagonalization of (X'xy! 


sac as 38 based on the theorem stated without proof in 


Chapter 3.° 
The model in (9. 1) is now written as 
y=Zar+e | (5.2) 
where 


Z=XA' ‘and ZZ=f 


a=A'Zp. 
The model (5.2) is obtained from model (5.1) by 
reparametrizing the former model. Our entire analysis will 
be conditional on the dataset X and hence this model (5.2) 
will also be conditional on X. The randomness in A will be 


due to the uncertainty in =. Hence, the prior assumed on 


(2,25) induces a prior distribution on (a,=,), which 


depends on the transformation matrix. We obtain the prior 
distribution of @ as follows: 


' = : Dy 
Bal. ,X)= A'y and Cov (2 g.X)= A= pA = % a: Given ~B 
rmined 


and X, the transformation matrix is completely dete cae 
ats 


and hence, the prior on a is multivariate norma 
is more 
N [A‘u,>,]. As discussed earlier, we make the analysis mo 


“omplete and robust by incorporating into the analysis, the 
; 2 e 
induced priors on the prior variances ox for all i, as thes 
nd 
Priors allow for variation in the elements of %@a 


“orrespondingly in » B: 
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Sample evidence is then combined with prior information 
on the unknown parameters to yield the posteror distribution 
of the unknown parameters by the application of Bayes’ 
theorem. This will enable us to obtain the posterior 
distribution of the shrinkage factors of the ridge estimators 
Using a Taylor series expansion, the posterior mean of Higa 
shrinkage factors is expressed as a ratio of infinite sums of 
hypergeometric functions. This enables us to obtain 
admissible Bayes estimators of the unknown parameters in 
the model. Important special cases will be considered. 


5.1.1: The Posterior Distribution of (0%, ,.+-,07, 507") 


The focus of this section will be to derive the posterior 


s - . ? 2 ke). 
distribution of (o%, eee yO" Jin the context of the linear 


model being considered given by 5.2. Next, we define 
shrinkage factors in terms of variance components 


(o2 ,...,02.,07"). The joint posterior distribution of these 


yr ay 9 
shrinkage factors is obtained. This will enable us to obtain 
the Bayes estimator of a and hence that of 6 . We will begin 
with a brief review of the Bayes estimator of q@ that has 


been discussed in chapter 2. 
The Bayes estimator of a for the model in 5.2 is written 


as 


@(B) = Ela 


YU, X, 29) P| 


(ZZ +07 |p’ T,) (ZZ)a(0) + [I - (ZZ +0? |p La) (ZZ)Au 
In terms of the reparametrized model (5.2), the GRE of a@ is 
written as 


G(K) =(Z'Z + K)'(Z'Z)a(0) 
given K, X, y, 2s; here a(0) is the feasible Aitkens GLS 
estimator of a and K =(o7/p")z;'. Thus, 


—&(K) = [a2 [(a2, +0? / p’)|@,(0), for all i= 1, 2, B 


" ars 
aes 


Le 
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given o*,= po, Pp is the ridge estimator of a,. This follows 
from the fact that ZZ =]. For the case when the mean on 
B and hence on @ is zero, i.e., when «=O, the Bayes 
estimator coincides with the ridge estimator, that is 


@(K) = @(B) and hence the Bayes estimator /(B) of f equals 


A(K). Without loss of generality, it will be assumed in the 
rest of the analysis in this chapter that the prior mean on 
B is Zero, that is, “=O. Thus, 


&(B) = &;(K) =[a2, a2, +07 /p )@,(0) for alli= 1, 2,...,K 


given o7 5 p> XY; p. Hence, the Bayes estimator of @; is 
written as 
a ~ 2 y) z ' 
a;(A) =5|02 Nog. to /p" ep. Xun fai) for allt. 


We Shall derive and study the posterior distribution of 


2 2 Beg .% 
lo 1102 to" /p ] employing prior knowledge on 


2,04 BE rt 2 a) 
(o ie ae ). The prior on( 23 02 | is 


induced by the prior that is assumed on 2, and depends 


on X. It represents an updating of the investigator’s beliefs 


2 a 


2 
about (03 .o2 no?) Since p €(-1,1), p* € (0,00) and 


we define a new parameter o2* = of / p*, which also varies 
between zero and wo. We assume that the investigator 


assumes inverted gamma priors on 52* and ,2. This 


implies an inverted beta prior on p* and the prior induced 


eet oes tteearostaeemuenasentwaseunerenmancuceseseeeeeeeeticoee? 
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on pcan be obtained from this prior on p". Alternatively, 


we could have assumed a prior on p directly. The priors for 
o2 and o2 *are assuined from the inverted gamma family 


given by the class P: 


po |A,Co) « (0777/2 exp[-Co / 207], 
te D * 2 *\-a, /2-1 2+ 
p'(a* "A .Cy) < (0% ") exp[-C, /20°"], 


where 4,4, >0,C, C,; 20. This prior on,2* together with 
the inverted gamma prior on 52 implies an inverted beta 
prior on p* with parameters (A/2+2)and4,/2. 

The prior assumed on %, is an inverted Wishart 


—_— 


distribution 


FIZ, oo K] « be -(v+K+1)/2 exp-1/ attr ar a. 


where v,K are the parameters of the inverted Wishart 
distribution and ©, is a symmetric positive definite matrix. 
For priors on a covariance matrix, it is reasonable to assume 


an inverted Wishart prior. This in turn induces a prior on 


the characteristic roots ae of 2 in the metric of (X'x)', 


which we derive as follows: The characteristic roots o* ,as 


seen before, satisfy the following equation given 


* 


e ‘|B, -o2(X'X)" 2G 


By this result and the theorem stated before, there exists 


a nonsingular (K x K) matrix C such that 
C(X'X)'C=A7 


and 


" . at 


pln. 
ae ae 


oe i 
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CLC = I. 
The elements of A are the eigenvalues of (X xy in the metric 


of Z,,. As seen earlier in section (3.1), C7Z7C'" has a 


Wishart distribution with parameters v, K and I.[See 
Anderson (1984), p. . 162.] Thus, the symmetric matrix H 


defined by H =C2,C has in inverted Wishart distribution 
with parameters v, Kand I. It was 21ewn that the Sy meine 
matrix B defined as 

Bean /2 pal? 
has as its characteristic roots the diagonal elements of the 
matrix £,. As reasoned above, B has an inverted Wishart — 
distribution with parameters v, K and A. Starting with the 


assumed prior on 4g, we can obtain the induced prior on 


the characteristic roots of 


We shall discuss the general case first and then consider 
important special cases. Using the result of Theorem 3.1 


stated in Chapter 3, we obtain the prior on (02,04, +++) Tag) 
as 
K 
p'(o%, jon (ets a.) oc ice pees 
exp-v/ 202,40 | [( /20¢, -1/20%,) 
i<j 

SY YCe() CeBa) /i! Celle, 

i=l x(t) 
where 


x(j) : anotation for a partition of the number j; a partition 
of weight ris a set of r positive integers 5 donaens J.) 
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re 
such that Yi) =. 
i=l 
C,.(Z): the Zonal polynomial corresponding to the partition ‘ 
- x(y), which is a symmetric homogenous polynomial 
function of the latent roots of Z, of degree j. 
P) ; a diagonal matrix with elements 6, for all 
130; = (Ao -4;)/2. 
Ix: the K dimensional identity matrix. 
Since, it is not possible to write out the zonal polynomials 
C,{Z) explicitly [See Johnson and Kotz (1972), p. 170-175] 


in the general case, further analysis becomes very difficult. 


Hence, we shall consider some important special cases 
and one of these will be developed in further analysis. 


1. In some cases, it might be the case that the eigenvalues 
of x'x-lin the metric of 2,, are approximately equal, 
so that we can assume that 


i ene es | 


Following the same reasoning as in Chapter 3, wecan © 
obtain the joint prior distribution of the ordered roots 


as 
ae 4 2 
’ 2 2 \-(v+K 2 
P07, F241 Fey) ©] | 2, ere? exp- (4 202 } 
i=] 
2 2 
[ | (22, O,)- 
1<j 


Let Gbe the matrix of normalized (to one) characteristic 
vectors of the matrix B. Applying the result in Theorem 
13.3.4 of Anderson (1984), p. 322 to the matrix B, we 
see that the matrix of normalized characteristic vectors 
of B, G is distributed independently of the 
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characteristic roots on, for all i. G is distributed 


according to a conditional Haar invariant distribution. 
These results will be used later. 


2. Another special case arises when some of the 
eigenvalues might be near zero and the others might 
all be equal, so that, we assume that 


and 
A m+ a Am+2 = ns Ak = 0. 
Let H,, denote the m-dimensional submatrix of H. 


Then, by the result on p. 395, Zellner (1971), H,, has 
an inverted Wishart distribution with parameters 


Al,m and (v-K+m). By the same reasoning as in 


Chapter 3, the density of the characteristic roots of 
His 


m 
’ 2 \-(v+K+1)/2 > 2 
p'(o2, 02,5402 ye | [lon Ok? exp- (4 /20%,) 


i=] 


i<j 
We shall consider the first case, which corresponds to 
the single nonzero eigenvalue 7. In this case, the joint prior 


2 2 2 . 
on (F517 oe, 209 Teg) 1S 


K 
' 2 2 \-(v+K+1)/2 1 2 
P02, 02, sr ag) © | Loe, On? exp- (4 / 204,) 
i=] 
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Incorporating the priors on ¢? and ,?*., we find that the 


ae z 2 2 eye, 
joint prior of (0%, 5-04,» 1s 


2 2 * *\— = (y+ 
Diet sanon oe oc (o*) A, /2 T [oz (v KG glo? jo.) 
' isl : 
K ~~ 
x exp(-1/2[C, /o"* +) 4/0%,)) 


i=l 

where g(o2.,07% ) = - 2 ) with gi a 
G(7q, 0G.) = (eae= Ca, ) with given v, A, a,,C,.This 

: _ i<j | sb vit 

is because the prior assumed on g?* and the priors induced 


on e, for all i are all distributed independently of each 
other. "9 — 


The likelihood function of the sample as a function of 
a,o’,and p’is a 


L(a,o2, p’) « (02)? /? exp- 1 / 207 |(y - Za)'V" (y - Za], 


where a isa function of 2, and X. Since a prior is assumed 

on a, from the Bayesian point of view, the likelihood function 
‘ : 2 2 2 2% . 

can be viewed as a function of (O° OG, +09 Fag 7 ) and is 


derived as follows. The distribution of y is obtained by 
incorporating the prior 


on a. The first two moments of y 
are: 


1. Ely|X,Ey,p) = 0 


2. Cov(y|X, 2g, p) = E(Zaa'Z'|X, 2g, p) + E(ee|X, 2g, P) 
+ 2E(a'Z'e|X,2g,p). 


Since 


Ela’Z'X,¥ 5,0] = E[E(a'Z |X, L4,4,p)| = Ela'Z'E 
[AX,Z psa, pl = 0, 


pad Fil aS 


ees ae 


eee 
eel a - " 


es Rt I a RN 
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ite L=C o> =o? / 9’ 
we write ov(y|X, pP)=o fp + ZTZ'|, where 


T = Diag(p’o? re 2 . 
g(x ay /o »P Oat Le 2505 De Le), 


Hence, the distribution of U(X Zp, 0) ~ NIO,} 4 


Hence, the liklihood function is written as 


2 2 2* +i 
LO, Fag 0") & [Ef /? exp-1/2y'S"1y] 


where y is as defined earlier. Using result T8 in Appendix I 
p. 324 of Leamer (1978) to substitute for » and on further 


simplification [see appendix F, F.1 for further details], we 
obtain the likelihood function as 


K 
2 Fe 2* 2*\-T /2 *\- 
Won Cag OVE) | [eden yer ke 
i=] 


xexp-(1/207*[y'y — p'(I+ TY" p)) 
where p=Z'y. 


By Bayes’ theorem. the posterior distribution of 


Caer. ss o7*) 


Cn oe is 


2 2 ant 


: 2 
OIG, su, aoe Log, 109 Fag F 


‘ ‘ < . 2 2 Qe x 
and hence, the posterior distribution of (09, 5+» Fax»? ) 


K 
* kV A,){2-1 2 
? (0? iO ae yo (0? ) (T+A,)/ [ ]rea) 
i=] 


aie 
xexp (-1/2[C, /o” +Hly, P,T)]) (5.3) 
where 
h(o? ) = (a2 yUrKW/2 4 4 2 for "Tez, -o;) 


a, otis: 
i<j 
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and 


K 
Hy, xT) = (1/07 y'y - P+ TY) p)s S/o?) 


i=] 


and y, X and A are given. 


Consider the transformation to the (o 


yk” 
Myx) Space 
where: 


a ae 2 2* te 
Ni = Og, (o,, +o ) for all i, 


ot aot 
The inverse transformation is given by 
a =o7"n; /(1-n;) for all i, 


on =o". 


Lemma 5.1: The Jacobian of the transformation from the 
(Ga 3eiGagO | space to the (a7 ,7,,...,Nx) 


space (where n; is as defined above) is given 
by 


Proof: 
The Jacobian of the transformation is 
2 2 2* 
Ao, per Faget | = id 
| ie 5 
A (oy Noe WK 


where the matrix Q is given by 


0 o*/(l-n)? O + 0 
Q= : : we ; : 


0 0 a. seectee (eae 
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- Thus, 


* . a: e. - : p) : 2 ye ; 
The posterior distribution of (3, 03Fae sO) is 


K 
p"(o2, ee. Oo" ) (co? )K-T | [a ~7, UFKY/2-1 (7 4K +1)/2 


t=] 


K Z K 
Fltienj)xexp -1/2071Q, + 0-m)a/n- >. pin) 


i=l i=] 
where 0<7),...,.7% <1. We define 


f(4.9;)= | [@: -7,;)/Q-7)Q-7,), 


i<j 
Q, =(C, + y'y), and Ty =(T+A,+u+K+1). The posterior 
distribution of (7,,7 er, x) 18 obtained by integrating with 


respect to g?*: 


kK 
| a ee ee 


” 1=] 
P'(M Mask) © z 
(Obnistowame 


where A= (A, +v+T-K-—1}) and Q(7,179.-59K)= 
a 2 

lO: + A= ni) / ni - p77), 
i=] 


The posterior density of (17),%9,.-.7K) 18 very similar to 
the posterior density derived in Chapter 3, and by proceeding 
on similar lines and using the multinomial expansion 


Bayesian Inference in. Econometrics 
92 


d ‘bed in Chapter 3, we can write the posterior density 
escri ) 


of (Ns orco IK) as 


ok i 
pinta) © >, Dy Du 


k=0 {I}=0 Uij=0 4j= 


(2) 
uo 
It 
Oo 
ame 
pal 
= 
Thee 
N 
+ 
»| 
=— 
N 


ifall 7, e (0, 1), 


where | 
2() = 2(k,LA, j,0;, (dy), (i)» (2), Q0-dn) 
= in+h+d, 7 eas dy ,2di2 (5 di 
(-1j2th4e g, (A / 2)*1 M(rmg)"" (Qo) Gj? (A)® 
i=l 


n K 
T ah Ido! | a! dia!) 
1=] ea 


and h; =d,. + (i-2)h-dj;, +1-(v+ K +1)/2. Thus, 


| K 
Jy oo Dng HOM Moro Ys XV [ari 
Elnily, X|= i 


eel 


K 
Jo Sng P'M Maree |Y> XV] [ari 
is] 


In terms of the hypergeometric function discussed before 
and in Appendix D given by f;[A, B,C; Z9;_,], we can obtain 
E[ 7\y, X] as follows: 


x. ie rane 
we 
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i Jo tn 


ee k Ay 
De Dd, MAC oe 


k=O {h}-0 ti }-0 dy=0 h-0 


E\nily, Xx] See pe ge tne 


J2 n 


De oe F stl [start Ci 24,29] 


k=0 {4!=0 {j7,}=0 dj=0 h=0 i=] 


x] [4 14).8).C 29j-1)29)] (54) 


ju 


where: 


Hil 


A; = h; for alli = DD doaihes 
=(n+d;,) for alli =1,2,...,K, 
= (jg +A / 2). 


Vs (1977) results enable us to obtain exact and 


approximate solutions to the functions of the form 


B;,C;Z;-1,22:], which will be discussed in Appendix 


D. We can state the following result: 


Theorem 5.1: 


io 


Consider the reparametrized linear regression model 
(5.2) with normal disturbances and a normal prior on 
a (the vector of coefficients). In addition, we assume 


the case of only one eigenvalue A(of X'X inthe metric 
of Xy). [See case (1).] Let a;(A) be the Bayes estimator 
of a; for alli. Then @;(A)=E[ 7;\y,X]@;(0) for alli = 1, 

2,...,K, where E[7,;/y,X] is given by (5.4) and given 


the conditional Haar invariant distribution of the matrix 


G. a;(A) is admissible for a squared error loss function. 


2. Consider the model specified in (5.1). Let BIA) be the 


Bayes estimator of 8. Then, B(A) = [A’7A']B(0), given 
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the conditional Haar invariant distribution of G, and | 
where 


n = Diag( 2(m|y, X),-. (74 |y, X)]. 
B(A) is admissible for # . Furthermore, 
B(A) =[A(U)' nAU)|B(0), | 


where A'(U) is the expectation of A’ taken with respect to 
the conditional Haar invanant distnbution of G. @ 


Proof: 


aA I 


1. Consider the model in (5.2). The Bayes estimator has 
been derived fora squared error loss function for this 


model in Chapter 2. The Bayes estimator is a;(A)= 
Eln; 


Section 5.1. Both 7, and 2%,(0)depend on the 
transformation matrix discussed earlier. We are able 


to factor out @(0) out of the posterior expectation 


y, X] a;(0) for all iwhere7; has been defined in 


evaluation because the distribution of Gs and the 


distribution of the matrix of its eigenvectors 
(normalized) G are independently distributed. The 
matrix of eigenvectors G is distributed with a 
conditional Haar invariant distribution described in | 
greater detail in Anderson (1984), pp. 321-322 and | 
Kshirsagar (1972), p. 441. Hence, the transformation 


matnix 4 = CA'/*G given X is distributed independently 
of o% for all i and hence of n for  E| nly X] is 


obtained in (5.4), enabling us to compute &;(A) for all 
i given the conditional Haar invariant distribution of 
G. Urner, the Bayes estimator of a,(A) for all zis 
admissible for all i by Theorem 2.3, Chapter 2. 
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2. For the model specifieg in (S.1), the Bayes estimator 


of 3s given by fA) =(A'y4(A), since p= (aya, 
Given y, X and G and that a(A) is Bayes for a, it 
follows that f(A) is Bayes for B. We write 
B(A) = [A’'nA']B(0) given the conditional Haar 
invariant distribution of G, where 
n= DiaglE(7,|y, X),...,E(nx|y, X)]. 

By the Application of Theorem 2.3, Chapter 2 to the 
Model 5.1, it follows that fA) is admissible for 


B. Hence, 


B(A) =[A(U) AU) B(0), 


where A’‘(U) is the expectation of 4’ taken with respect 
to the conditional Haar invariant distribution of G.N 


5.2: Linear Regression with Errors in Variables 


Linear regression models specified on the basis of 
economic theory are aimed at studying the impact of the 
independent variables on the dependent variable. The vector 
of unknown coefficients, #, measures this impact. The 
implicit assumption is that the independent variables are 
accurately and precisely measured. However, it is only in 
rare cases that the measurements of the independent 
variables are absolutely error-free. Generally, economic data 


(Y,,X;) derived from government or private publications 


report only significant digits. These rounding-off errors are 
a likely cause of error in the measurement of variables. In 
addition, some variables may have inherent difficulties in 
their measurement. nie 
We will present and consider the problem of errors in 
variables using the following model as an example: 


‘Y= XB+ u | (5.5) 
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y 7x 1 vector of observations on the dependent variable. 

x Tx K matrix of true observations on the independent 
variables. 

x Tx K matrix of explanatory variables, all of which are 
assumed to be stochastic, distributed independently 
of (f,07); the matrix X'x is generally assumed to be 
nonsingular. 


f : Kx 1 vector of unknown parameters of the model, 


assumed to have a normal prior, N[u,0%/]. 


u : Tx 1 vector of spherical disturbances distributed with 
N[w,o7 1]. | 
V : Tx K matrix of measurement errors assumed to have 


an N/[O,»,,] distribution. 


Thus, in this case X = X +V, where the unobserved true 


variables are expressed as a combination of X and the matrix 
of errors given by V. The distributional assumption on the 
matrix Vis motivated by the following discussion. 


If the number of significant digits after the decimal point in 


the ;**regressor is h, where h, is known, then the 


perturbation in the ;‘? independent variable is given by 


d;=1/2(10)", 
which is based on Wilkinson’s error bound for rounding 
errors.> The matrix of measurement errors V =(V;) is 
assumed to satisfy the following assumptions: 
E(Vj)=0 for all i,j, E(Vj)=d? /3=0? for all i,j and 
E(V;Vi)=O for all i#i',j#j'.For a uniform variable 


ranging from -d, to d;,the variance is d?/3. Since the 


uniform distribution is often used in the context of rounding- 
off errors, we use the mean and the variance of the uniform 


iy? 


Exte 


var. 
ran 


rou 
to | 
an 


eifg 
re] 


wor or ieee 
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adopting the larger 
asurement errors than 
rlier, this model is seen 


variable but not its rance. By 
range (20, 00), we allow for larger me 
rounding-off errors. As discussed ea 
to be a range a model in the classic \ fre 

and can be reduced to sunpler models using a we Sslatedells 
that diagonalizes x'x and yv to anne a 
eigenvalues. Applying this transformation pective 
reparametrized model which will be studied ee ae a 


Consider the model specified in (5.5) and assume that 
poth XX and VV are positive definite. There exist 
orthogonal (K x K) matrices A, A, such that 

A(X'X)A=A 
and 
A, (VV)A, =A,, 
where A =Diag(4,, 49,.-..4x), 4; for all tare the characteristic 
roots of x'x and A, is the matrix of the characteristic roots 


of yvV. The transformation matrix A depends on X, and A, 
depends on V. 7 
The model in (5.5) is now written as 


y=Za+Uyte (5.6) | 
where: 


Z=XAandZZ=A, 


a=Afp, 
U =VA, andUU =A), 
y¥=A,f. 


The prior assumed on f in this case is not as general 


as in other models considered in this study. A scalar 
covariance matrix for # is assumed because the aa 
case does not admit an explicit solution. Also, it 3s ditficu 


Oo d obtain 
to analyze the general covariance matrix 2, an 


“y 
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compact solutions. We consider a special case where 


oie 
Oy, =O, 


for all 7. In this case, £, =07J. In this framework, the 
distribution of @ is given by E(a|X) = A’u, Cov(a|X) = A’ogIA 
=o,l = oi, where of = 0% and the prior on a is 


N[A'u,o2 I] given X. 


5.2.1 The Posterior Distribution of (a?,07) 


The focus of this section will be to derive the posterior 
distribution of (o?,o2) in the context of the linear model 


being considered given by 5.6. Next, we define a shrinkage 
factor in terms of the variance components. The posterior 
distribution of this shrinkage factor is obtained. This will 
enable us to obtain the Bayes estimator of @ and hence 


that of #. We will begin with a brief review of the Bayes 


estimator of q@ that has been discussed in Chapter 2. The 
Bayes estimator of a@ for this model is written as 


&B) = Elaly, X,03,07,6] 
=(Z'Z/5 + KY (ZZ /d)a(0) + [I -(2Z/5 + Ky "(ZZ /d)A'n, 


where K =(07 /o7I)and6é =(l+ojo; /o°). 


In terms of the reparametrized model (5.6), the GRE of 
a is written as 


a(K)=(ZZ/5+K) (ZZ /6) &(0) 


given K,X,6,yand K =(o7 /o?I). The OLS estimator of a 


is @(0). This is because in the model (5.6), the error term 
has the following covariance: 


Cov(e + Uy) = E(ee') + EY E{Uyy UV}. 
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This is further simplified as 
Cov(é + Uy) =[o7I + e671] = 05] 
since Y and « are independently distributed. Thus, 
@,(K) =[A:0% / (075 + Ajo?) a, 0), fori=1,2,...,K 

is the ridge estimator or a, given oc? ‘o ed ,Y, 6. 

For the case when ihe mean on # and hence on ais 
zero, i.e., When 4 =O, the Bayes estimator coincides with 
the ridge estimator, that is, @(K) = &(B), and hence the Bayes 


estimator §(B) of @ equals B(K). Without loss of generality, 
it will be assumed in the rest of the analysis in this chapter 


that the prior mean on f# is zero, that is, 4=0. Thus, 
G,(B) = &.(K) =[A,o2 / (076 + A;o2)\a;(0) for alli= 1, 2,..., K 
given o7, of ,X,0,y. Hence, the Bayes estimator of a; 
written as 


& (A) = ElA,o2 | (076 + A;02 ly, X] @; (0) for alli. 


As the posterior density in the most general case where 
all the eigenvalues 4, are distinct is difficult to determine, 
we consider an important special case where all the 
eigenvalues J, are equal to 4 . In this special case, the Bayes 


estimator is written as 


y;X] &; (0) for all i. 


(A) = Elo? | (075 + 40%) 
We shall derive and study the posterior distribution of 


(Ao? /(075 + 202 \y,X] employing prior knowledge on 


2 


2 : , 2 
o*,o2. Inverted gamma priors are assumed on o“,o;, and 


: nh ; _2 Pe 
o2 independently to yield the joint prior on (0°, o7,0,) given 
by 


re 
soe 
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p'(o?, 02,02) re (2) tol PV (2 Aa (92-224 
x exp+1/ Cy fo? +C, (02.4.0, / oy. 
The likelihood function of the sample as a fur.ction of 


a,o7,and 6 is 

L(a,o?,0) x (07,6) 7? exp -1/2(0°,6)[(y - Za)’ (y- Za)], 
where @ isa function of Xand /. Since a prior is assumed 
on a, from the Bayesian point of view the likelihood function 


can be viewed as a function of (¢?,02,07) and is derived as 


follows: The distribution of y is obtained by incorporating 
the prior on a. The first two moments of y are: 


1. E(y|X, V)=0, 


2. Covly.X, V) = E(Zaa'Z'|X) + Elle + Uy\(e + UY)||% VI] 


+ 2E(a'Z (e+ Uy), V). 


Let J = Uy +e. Since : 
Bla'Z'J|X,V] = B[E(a¢’'ZU|X,V, a)] = Ele’ Z'E(U|X, a, V)\ = 9, 
we write L =Cov(y|X,V)=([07 + 022ZZ' + 040, 1].Hence, the 

distribution of y(X, V) is N{O, 2]. 


The likelihood function is therefore 


L(o? 07,07) Ca ee exp- 1/ y= "yl, | 


where 5 was defined earlier, On simplification [see Appendix 
F, F.2 for details], the likelihood function is written as 


i=] , 


ly'y -(t/dlq‘tt / 5A + 1") 


= | 
e 8 = 
L(o? 04,04) « (075) 7] [+t /o2,)? xexp -1/2076 


— annem 


Tr 


wh 
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where q=Zy. 

Considering only the special case where Ay = Ay = Ace 
Ay = A, we write the posterior distribution of (o7, o? C0; 2) by 
Bayes’ theorem as, 

PMO 710) (FV AN gy AelP A gd) AeA gy /2 
(l+th /dy*/* x exp-1/2 [Hy +1/0d(y'y -t /dq'(I +t / aly'q)), 
wihete Ho = Colo. [EZ 4, /o?. Consider the 


2 


transformation to the (o*,02,h) space where: 


Pe: a ee 2 
h=a* /(o,05 +0°), 
Se tecge 
The inverse transformation is given by 


o, =(1-h)/th, 


Lemma 5.2: The Jacobian of the transformation from the 
(o?, 07, OC; 2) space to the (o?, o? ,h) space is 
given by 


Proof: 
The Jacobian of the transformation is 


|= 


where the matrix Q is given by 


A(o? 22:74) 


a0”, @2,n) | (2b 


| 10  (1-A)/(ho?) 
Q=|0 1 -o%(1-h)/h(o2) | 
0 0 ~1/th? 
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| | 7 = ‘Hi 

1 th : 
Using this transformation, we find that the posterior of 
(o7,02,h) is 


p"( 2 a Sties (92) 72/2114 py tay te! tnt? 


(1+ tah) */? x exp-1/207[Co+C, /t+0°C,th/(1-h)+ Hy], 


,o2,h)xo 


where 


K 
Hy =h(y'y-th/(1+thi)> q?). 
i=1 


Using multinomial expansions [see Appendix F, F.3], the 
analysis proceeds further by simplifying the posterior density 
of (o*,07,h). The posterior density of (a? 02) is obtained 
from Appendix F(F.3) and is written as 


g-(P +4)/2-12- 


, | HOV (Cyt ea 
xexp-1/2 [Cp /o? ie Ca 


p"(o, a) e ae uo 


where 


u = (-1)""***" gq, p, (A) (y'y)® / (2h Ny! ") 
and 


[o.9) m ice) co 
pap lp as 
m=0{I;}=0 k=0 v=0 
for notational convenience. Considering the transformation 
to the (a7, t) space, which has Jocabian g?, we obtain the 


_ posterior density of (o”,t) as 


g-(T +A)/2-Ig 


p"(o?, the Duar EyP(C,)*t"(o2 ee! 


exp-1/2 [Cy /o? $C. Tart. 


Lens tree 
' 
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Integrating out g*, we obtain the posterior density of t as 


= utr 40/2-1 lL C l, 
[Co + Ca / t]" 
where J,, =(T+4+A,)/2+l. 


In the special case that was assumed before, where all 
the eigenvalues are equal, we transform to the 77 space with 
7 given by 

n =At/(1+At), 


and obtain the posterior density of 7 as 


p(n) «Xp QP (Cy) uly) *°” 
x(l— ay "*42!2) 1[Con t+ ACg (= I". (5.8) 


Thus, we can obtain the expectation of 7. as 


Je ” Lx d 
E[nly, X = rile eT 
J, ep" (nly, X) dn 


We define the following integral: 
mn” (L-1)" 


gO Spy 2g | ee 
fA see eee ears 


where 22, © [O, 1], for j= 1, 2; this is a generalization of the 


hypergeometric function studied by Appell. We define 


Z() ad 2(T, Fn Ges Wns Pus 4, Ur G ls Sn) 
QP (CY agg Dy (A) (y'y)® 
2” (Ly Nl Nlls 8) 
In terms of f(:), we can obtain E{nly, X ] as follows: 


Ely, X] = Ly ZF (A +1,B,C; 2,25] 
yp 2) f(A, B,C; 2,, 29] 


(5.9) 


104 Bayesian Inference in Econometrics 


where z, =O and Zz, = (1 ~AC, / Cy). We shall assume that 


Co> AC, so that Zz) <1.The parameters A, B, C are given 
by the following: 


A=(n-A, /2) 
B=(-n+A, /2-1) 
C =(T,,). 


In this case, f(---) can be obtained directly using the 


results of Hill (1977) to obtain expressions for this integral. 
We can now state the following result: 


Theorem 5.2: 


1. Consider the reparametrized linear regression model 
with measurement errors, and a normal prior ona (the 


vector of coefficients). Let the Bayes estimator of a; be 
@(A) for all i. Then, a;(A)= E{nly, X] a, (0) for all 


i=1, 2,..., K, where Efnly, X] is given in (5.9). a@;(A)ts 
admissible for a squared error loss function. 
2. For the model specified in (5.6), the Bayes estimator of 


Bis B(A) =(A')a(A), which is admissible for P.™ 
Proof: | 


1. Consider the model in (5.6). It has been shown in 


Chapter 2 that the Bayes estimator of a,;is @;(A)= 
El 


enabling us to compute q@; (A) for all 1. By the application 


y, X] @,(0) for all 7. E[nly, X]is obtained in (5.9), 


of Theorem 2.3, Chapter 2, it is seen thata, (A) is 


admissible for a;. 
2. The Bayes estimator of f is given by f(A) =(A')'a(A), 
since # =(A')'a. By the Application of Theorem 2.3, 


Chapter 2 to this model, f(A) is admissible for 6. ™ 
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The existence of uncertainty in decision making and the 
lack of information on the part of economic agents are 
important features that have become increasingly important 
in economic modelling. Economic models that attempt to 
be more realistic and useful have tried to incorporate 
uncertainty and have tackled the problem of inadequate 
information. In the face of uncertainty and inadequate 
information, economic decision makers are continually 
required to make decisions based on unknown present or 
future values of relevant variables, which must be predicted. 
This leads to a formulation of schemes, which enable the 
future and unknown present values to be predicted from 
past and available data. 


Several expectations hypotheses have arisen in the 
literature in this context, representing various schemes to 
predict unknown present and future values. The adaptive 
expectations hypothesis due to Nerlove (1958), the rational 
expectations hypothesis due to Muth (1961) and the stable 
cobweb model due to Carlson (1968) are important variants. 
A Bayesian Approach to the theory of expectations was 
Proposed by Turnovsky (1974).* We propose the same 
approach here, although unlike Turnovsky (1974) we assume 
here that the econometrician as well as the market 
applicants, if applicable, are not aware of the aaa of 
the variables under consideration. The Bayesian ener wi 
developed in Chapter 3 will be applied in the ue . ur 
linear model to propose an expectations hypothesis ° 
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be Bayesian in nature. This will form the first set of 
applications to be studied in this chapter. 


The second application will be to consider a model of 
duopoly where there is uncertainty on the part of economic 
agents, demonstrating how a Bayesian approach as well as 
the Bayesian expectations scheme developed earlier will be 
found useful to characterize the market solution. A model 
of duopoly proposed by Cyert and DeGroot (1974)° will be 
the starting point of our analysis, although the distinction 
in our analysis arises when uncertainty in variances on the 
part of firms is captured by a Bayesian analysis that assumes 
priors on the unknown variances of the variables that are 
being modelled. 


The two sets of applications considered in this chapter 
demonstrate a set of problems in economics wherein the 
random effects models developed in Chapter 3 can be applied. 


6.1 Theory of Expectations: A Bayesian Variant 


Turnovsky (1974) posits a Bayesian approach to the 
theory of expectations which attempts to provide a 
behavioural basis for the proposed predictors and contrasts 
it with other schemes such as the adaptive expectations 
hypothesis (which are mostly ad hoc). This approach 
presumes that the decision maker has some prior knowledge 
about the statistical process that 1s being observed. It can 
be seen that schemes such as the adaptive expectations 
scheme arise as special cases from the Bayesian scheme.’ 


Several cases in the context of the linear regression model 
will be considered wherein relevant and important variables 
are not observed and expectation forming mechanisms need 
to be specified in order to proceed with econometric inference. 
A Bayesian expectation scheme is proposed in such cases 
which then enables us to proceed with inference in the linear 
regression model. We consider three specific cases and in 


each of these develop a Bayesian approach to the theory of 
expectations. 


Case 1. Unobserved, Stochastic Independent Variable: 
Consider the linear regression model 
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Y,=a@+ IX, +, (6.1) 
forallt=1, 2,..., T, where 
y, : represents the dependent variable in time period t. 
y, | represents the Sample mean of cross sectional 
observations on Y,in time period t. 
X; : represents the explanatory variable in time period t 


which is assumed to be stochastic in this case and is 
unobserved. 


represents the constant term of the regression model. 


represents the unknown parameter of the model. 


é, ‘ represents the disturbance term in time period t which 
is normally distributed with mean and variance given 


by 
E{e,) = 0, 
Ele?) = ae: 


This implies that the disturbances are homoscedastic; it is 
further assumed that the disturbances lack any serial 


correlation. | 

An important example of this kind arises in estimating a 
consumption function in which consumption 1s specified to 
be a function of permanent income. As permanent Income 


is not observed, estimates of a and # cannot be obtained; 


a and fBcan be important parameters from the policy 
making point of view. In the case being considered, we will 


assume that the investigator has only the data on Y, to work 
with; there is no suitable proxy variable available easily or 


immediately for X * to work with. To overcome this problem, 
an expectations scheme on Y, that naturally emerges from 


Bayes’ theorem of learning is utilized. 


110 Bayesian Inference in Econometrics 


Let Y, be defined by 


EIY, |e, B,X;|=¥, =a + BX?. 


Then, we have the distribution of Y,as 


(Y.la,2.X0)~ N 1¥,,03]. 


Considering this model to be a random effects model in the 


classical framework, we assume priors on a@ and f# of the 
following form: 


a-~N [0,07], B ~N [m,o%]. 


It is assumed here that @ and # are independently 


distributed. This leads to a prior on Y, which is normally 
distributed as° 


HIke een Ye", 


where = B[Y,|X;]=mX;=¥/ and —_Varl¥,|X;] = 0°. 


= 92 +03X;7; this is because by definition, Y, = f(X;) and 


hence the prior on Y, is obtained by conditioning on X*. In 
the absence of any other source of information, the posterior 
distribution of Y, in time period tis obtained by combining 
the prior on y with the sample mean from the observed 
values of Y,. By Bayes’ theorem, the posterior distribution 
of Y, is 

¥,\Xp ~ N[Y;",02..] 


given the data on Y,. The posterior mean Y,” conditional 


on the value of X/ and is given by 


Td a | eae 
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and the posterior variance iS given by 
Ripe 2 2 
[o. In foy+1/o?.), 


It is being assumed here that there are n observations 
available on Y, that are made ON Cross sectional units in 
time period t such as households or firms. 


The analysis proceeds further by linking the prior mean 


of Y,,, to the prior mean of Y, by assuming a conditional 
distribution on X7?,,. Now, Y,,, = f(X‘,,) is unknown in time 
period t to the econometrician because Xi,, is unknown. 


The prior mean of Vey is obtained as follows: as defined 


before, 
ELY,.1| a, B, Xe 7 Yous =art PX 
Incorporating the priors on a and ff assumed before, 


E[Y41|Xta1] =MXp1- 


Now, we assume that X/,, has a distribution conditional 
on X* and J, (which contains all other relevant information 
t 


in time period t) with a mean equal to g(X;) where g(X,) 1s 
a known function of X; given J,; this analysis does not 


Se gece dso ae ie 
specify the parametric form of this distribution nor 1 


variance. 
ibuti : med above, 
Using the conditional distribution of X;,, assum 


the prior mean of Y,,; 18 
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Youn = BLP al Xe Le] = mBLXT a |XY, Le] = mg(X7) 
Thus, the prior mean of Y,,, is 
Yiu. = mg(X; ). (63)~ 


The prior distribution of Y,,, has a mean Y,",,; we do not 


calculate the variance of Y,,, as the analysis does not depend 
on it. 


Equation (6.3) can.be modified further and can be written 
as 


Xt) —(A, )X; = [mg(X;) —(A, )X; | HUEY (6.4) 


given Xx, and A, (defined before). Our analysis will assume 


that the variances defining A, are unknown to the 
economctrician and hence equation (6.4) will be evaluated 


’ beds ctuee 2 
with respect to the posterior distribution of oF and ¢,-. 


Thus, we can evaluate equation (6.4) by taking the posterior 
expectation of 4,: 


Xf, — El(A,|data]X; =mg(X/)- ElA,|data]X? + v,.,. 


given X;. Letting E[/,|data] = 7, , we get: 
Xe AX = mgX) - AX + Yeas (65) 
where X/ is given. Equation (6.5) describes an expectation 


generating scheme where the expected or permanent 


(unobserved) value of X;,, is shown as a function of X}. In 


a special case where g(X;) takes a specific functional form, 
equation (6.5) reduces to a scheme of adaptive expectations. 
Using equation (6.5) and the conditional distribution 


assumed on X;,,, we can rewrite the model as: 
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ati bin Faas shite ' 
RN de a al a ANY, pemate “A \ 


/ 


—- 


ceeding. te —— 


«went. teats ony 


Modelling of Expectations and Uncertainty on 113 


Wor “AVI =O~ AaMer) + Bma1X}) 7.7) (e,.) - Ze) ev, 


that 1S, 


Yar = 40% + (1A Ma) + BOng( x?) - AX) = ti (6.6) 
where m, re and the function g are known. Unknown A, is 
being estimated by the posterior mean of Aide: Equation 


(6.6) is conditional on X; which is assumed to be unknown 
and hence is itself not operational from the point of view of 
estimation. However, the investigator would be able to 
express (mg(X;)- 2,X")in terms of an observable quantity; 
this would depend on the type of experiment and /or dataset 
on hand. (This would essentially correspond to a growth 
rate in X; over time). This would make equation (6.6) 


operational in estimating the parameters qand # by 
maximum likelihood methods. Also, Bayes estimates can 
be obtained incorporating prior information already assumed 


on the unknown parameters a and f. We briefly outline how 
we could evaluate E[/,] and use it in the method described 


above. 

In Chapter 3, the general linear model with k explanatory 
variables was considered and in this context, the posterior 
expectation of ratios such as A, was derived. The present 
Case is a special case of the more general case and the 
methods developed there are easily applicable here. For the 
Sake of completeness, we briefly outline the method as 
applied to the model under consideration. For time period t, 
Suppose there are n observations available on the cross 
Se unit being studied such as the household or the 
irm, 


We write the model in terms of the sample mean y, in 


time period tas 
(6.7) 


=< 
+ 
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for all t= 1, 2,..., T, where Y, =a + BX;,a8 Y, does not vary 


with the n observations collected on Y;,; &,18 the sample 
mean of the disturbance term considered before in time 


period t and is distributed as N [0,04 /n] and the prior 


pos k 2 ; ‘ ‘ ; 
assumed on Y, is N[Y¥,,0)-]- The likelihood function is 


written as 


is 
Pa * y) 2 
L(oy, oo.) & (oy /n+o%.) OD Di ie) /2oy/n+o%.), 
t= 


; 2 ; 
Priors on oy and Oy are assumed to be from the inverted 


gamma family and assuming also that they are 
independently distributed, we write the joint prior as 


poz, o°.) cod (oar) ld ma. 4 !?"" exp-[C, /o2+Ci / ov] lor 


By Bayes’ theorem, we write the posterior distribution as 


Af2-1 o. -A° [2-1 T/2 
'oj,03.)% (og) 4 Nor. p86? Inso%y 
i 
CG 
x EXp- eres Cy mk: oe 


2 eT 
oy /n+o.. ) 20, 20°. 
y 


Consider the following transformation: 
ae 2 
Age Cy [(o, + nov. ), 


Or 


2 
(oy + nov. ), 


where the Jacobian of the transformation ; is 6, /Nn. We write. 


the posterior of 4, and 6, as follows: 
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22 Se 


Modelling of Expectations and Uncertainty 


115 
” S 74/242 /247 /2)-7 . _; 
pA Or) % (0; ) (2+T f2) we iat ~ Ayre lat 
C nc. 
xexp-1/2 2 y 
ie tS eno (6.75) 


T 
= ~Y,")? 72 , 
where © 2 (yr — YY)" / Or. Integrating with respect to 


6,, we have 


R21 1-j)7/21 
p"(A;) oC f ( t) (6:76) 


T 
[Cy /Ar tnCy /(L- A) +n} YY, -Y Pye TP 
t=] 


This posterior density is seen to be a Special case of the 
density studied in Chapter 3, equation (3.4). We rewrite it 
in the following form: 


AA (PHT -2)/2) = J, )t/2HT-2)/2 
7 ; 
[Cy(l Ap) + Cy (Ay) + Av Aen S” (yp - YF P44 (24772 
t=1 


pay) « 


E{A,|data] can be evaluated using this posterior density in 
the manner explained in Chapter 3, Section 3.1.3. In 
addition, it can be calculated using numerical integration 
methods. 
Case 2. Unobserved, Stochastic Independent Variable with 
an Observed Proxy: 

The model to be considered here is the same as in case 
(1) with the difference being that the investigator has data 
on an observed proxy of the independent variable. Consider 
the linear regression model 


Y,=a+fxi +e, (6.8) 


for all t= 1, 2,... , 7, where: 


er tape intense ig yen 
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represents the dependent variable in time period ¢. 


- represents the explanatory variable in time period 7 
which is assumed to be stochastic in this case but o 
unobserved. | 

: represents the observed proxy of the explanatory 


variable in time period t which is stochastic. 

x, : represents the sample mean of observations on X, in 
time period f. 

: represents the constant term of the regression model. 


: represents the unknown parameter of the model. 
: represents the disturbance term in time period twhich 


is normally distributed with mean and variance given 
by 


E(e,) = 0, 
E(ez) = a4. 


This implies that the disturbances are homoscedastic; it is 
further assumed that the disturbances lack any serial 
correlation. 


This case differs from case (1) in that the investigator 
has data on a measured proxy for X*, say a variable ;. 
Thus, the estimation of a and f proceeds by making use of 
this data together with the data on Y, although as in case 
(1), Xfis not observed. A distributional assumption on 
X, enables us to set up an expectational scheme that would 
link X/ to the observed X, ; transforming the specified model, 
we could obtain a relationship between observed Y; and 


observed X,. This will then enable us to estimate a and /. 
This is the case to be analysed here. 


“ 
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The variable that is actually observed is X, and it is 
assumed to have the distribution 


X; = N eek 
A prior on X, is assumed in time period t of the form 


X, = N E.@8 ost 


where xX; is the subjective mean of X, The posterior 


distribution of X, in time period t is obtained by COnIOIS 


the prior on X, with the observed sample mean from the 
observed values of xX;. By Bayes’ theorem, the posterior 
distribution of X, is 

x ldata ~N Beemer 
The posterior mean X;" conditional on X7 is given by 

X, = A,X; + (1-A,)x, (6.9) 

where x, is the sample mean of the observations of X, and 

A, = 0% /[o% +no?.], 
and the posterior variance is given by 

I Bo rae oy | e +1/ o Pak 

Ais” seen in the previous section, nis the number of 
observations made in the same time period t on X, with 


respect to different cross sectional units. 
The analysis proceeds further by linking the prior mean 


of Kes to the prior mean of Pe by assuming a conditional 


distribution on X/,,. Now, X,,, = f(X/,,)and is unknown 


_ a eh 
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In time period t to the econometrician because Ae 8 
+ 


unknown. The prior mean of Kos is obtained as follows: 


Bd 


E[X143]X One 


a 
Now, we assume that Xo4 has a distribution conditional on 
X, and I, (which contains all other relevant information in 
time period t) with a mean equal to g(X;) where gis a known 


function of X/ given I,; this analysis does not specify the 
parametric form of this distribution nor its variance. 
Using the conditional distribution assumed above, the 


prior mean of X,,, is 


El X41|X, 02] Ee E[ Xe, 


Xp ti]= 9X). 
Thus, the prior mean of X;,,, is 


El Xe4 


X= oO): 


We do not calculate the variance of X,,, as the analysis 


does not depend on it. 
Equation (6.10) can be modified further and can be 
written as 


Xea ~ (Ag)Xe = [ma (Xe) ~ (Ae) Xe] + Veet (6.10) 


given Das and A, (defined before). Our analysis will assume 


that the variances defining 4, are unknown to the 
econometrician and hence equation (6.10) will be evaluated 


Perse 2 
with respect to the posterior distribution of o and 0}:- 


Thus, we can evaluate equation (6.10) with respect to the 
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posterior expectation of 4, and letting E[A,|data] = Ai, we 
get; 
Keer ~ AaXe = [mg Xp) — 1,7] + Up, (6.11) 
Equation (6.11) 


describes an expectation generating 
scheme where the ex 


pected or permanent (unobserved) value 


of X;,, is shown as a function of as Ai can be evaluated 
as explained in case (1). Using equation (6.11) and the 


conditional distribution of X;41, We rewrite the model as 


(Ysa —Ar¥e) = (1- A, )(on) + B (mg(X1) - 7,2) + 
(E441 - Ase) t U4) (6.12) 


given A and X;. In effect, unknown 4, has been estimated 


by its posterior mean, ae Equation (6.12) is conditional on xX; 
which is assumed to be unknown and hence is itself not 
operational from the point of view of estimation. However, 


the investigator would be able to express (mg(X;)— A,X) in 
terms of observed x,;this would depend on the type of 
experiment and/or dataset on hand. (This would essentially 
correspond to a growth rate in X? over time expressed in 
terms of x,). This would make equation (6.12) operational 
in estimating the parameters a and £B by maximum 
likelihood methods. Also, by incorporating prior information 
on the unknown parameters a@ and #, we could obtain 
Bayes estimates of a and £. 


Case 3. Unobserved, Dependent Variable 

Several situations involve the desired level of a specific 
variable such as the dependent variable which remains 
unobserved. For example, a simple version of the stock 


adjustment model states that the stock of a commodity that. 


_a firm desires to hold is a function of sales. In general, this 


a Fs CARRS SS RS a sls SS ee eis © me 
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desired level of stock is not known nor observed during the 
time period but would depend on the amount of sales. This 
is a level of stock that is theoretically optimal. Ex post, this 
desired level could be specified; we are trying to capture the 
learning that the firm does in trying to decide the correct 
ratio between the level of stock and the actual sales. We 
consider this model which is written as 


Y, =a+BX, +5 (6.13) 
for all t= 1, 2,..., T, where: 


Y,; : represents the desired level of stock in time period t. 
Y, : represents the actual level of stock in time period t. 


y, : represents the sample mean of observations on Y, in 
time period t. 


X, : represents sales of the firm in time period t which is 
assumed to be stochastic in this case. 


@ : represents the constant term of the regression model. 


£8 +: represents the unknown parameter of the model. 


€, ‘i represents the disturbance term in time period t which 


is normally distributed with mean and variance given 


by 
E(é,) = 0 
Bey =o. 


This implies that the disturbances are homoscedastic. The 
error term captures the deviation of the desired level of stock 
from the actual level of the stock in addition to other errors 
that might arise from reasons such as misspecitied models. 


In this case, the dependent variable Y,, is not observed. 


The investigator is assumed to have data on the independent 
variable X,and on a measured proxy variable for Y,’ say 


Y,- Once more, estimation of @ and fB is not possible. 
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Utilizing an expectations scheme that emerges from Bayes’ 
theorem, the model is respecified to posit a relationship 


between observed Y, and X,. Estimates of g and f are now 
obtained as before. 
Suppose the actual level of stock of the firm is given by 


Y,|X, ~N [Y, 07] 
where the firm does not know the parameters Y, and a7 -A 
prior is specified on Y, which is normally distributed: 
YX; ~ N[¥',02,]. 


The posterior distribution of Y, in time period tis obtained 


by combining the prior on Y, with the observed sample mean 
from the observed values of Y,. By Bayes’ theorem, the 


posterior distribution of Y, given data is 
Y,~ N Ye", oo]. 
The posterior mean Y,” given a value of Y, is 
Ye" = ALY; +(L- Ay, (6.14) 
where y; is the sample mean of the observations of Y, and 
Ay = oa jlo; +no.] 
and the posterior variance is given by 
2 2 2 
1/o}. aln/ o, +1/o).]. 


The analysis proceeds further by linking the prior mean of 


Y,., to the prior mean of Y, by assuming a conditional 


distribution on Y,",. Now, Y,_, is unknown in time period t 
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to the econometrician. The prior mean Y,,, is obtained as 
follows: 


* 


E%al¥7.] = Yea. 


* 


Now, we assume that Y,|, has a distribution conditional on 
Y,; and J, (which contains all other relevant information in 
time period t) with a mean equal to g(Y,") where gis a known 


function of X; given I,; This analysis does not specify the 
parametric form of this distribution nor its variance. 


Using the assumption above, the prior mean of Y,,, is 


EVYalY = 2s 


Vi glil= oe): 


We do not calculate the variance of ae as the analysis 
does not depend on it. 


Equation (6.15) can be modified further and can be 
written as 


Yer — (Ai) ¥ = [mg (¥) - (AY) + Yea (6.15) 
given Y, and A, (defined before). Our analysis will assume 


that the variances defining 4A, are unknown to the 
econometrician and hence equation (6.15) will be evaluated 


, be Acta 2 
with respect to the posterior distribution of a and o,.. 


Thus, we can evaluate equation (6.15) by taking the 


posterior expectation of /,: 
Yea ~ El(A,|data]¥,’ = mg(¥,") - E[A,|data]¥y + vy. 
given Y,", Letting E[A,|data] =A,, we get: 
Yin ~ ALY = mgl¥,")-A,¥e + ven (e210) 


where Y,’ is given. 
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Using equation (6.16) and the conditional distribution of 


Y,, we rewrite the model as 
(mg) — AY) = (L- A,ja + A(X... ~ 3.x) 4 
(E141 — Ar€,) = Veuz: 


The equation can be further rewritten as 

(mgr) = - AX =O Ala t AX -AX Jeu, 617) 
where ‘A is known. In effect, unknown A, has been 
estimated by its posterior mean, ae Equation (6.17) is 


conditional on Y," which is assumed to be unknown and 


hence is itself not operational from the point of view of 
estimation. However, the investigator would be able to 


express (mg(Y, ) — 1.Y;) in terms of an observable quantity; 


this would depend on the type of experiment and/or dataset. 
on hand. (This would essentially correspond to a growth 


rate in Y,” over time). This would make equation (6.17) 


operational in estimating the parameters aand Pf by 


maximum likelihood methods. By methods described in 
Chapter 3, we could incorporate prior information on the 


unknown parameters to obtain Bayes estimates of a and f. 


6.2: Estimation of the Consumption Function Based on 
the Permanent Income Hypothesis 


This section illustrates the use of the theory developed 
above by considering a specific example: that of the 
consumption function. An aggregate consumption function 
with a distributed lag is considered and quarterly United 
States data are employed to estimate the parameters of the 
consumption function using a Bayesian method discussed 
above. 


Distributed lag models have played an increasingly 
important role in econometrics since the pioneering work of 
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Koyck (1954). Several studies have developed methods to 
analyze distributed lag models in the context of the ageregate 
consumption function, such as Klein (1958), Solow (1960), 
Griliches (1967), and Griliches, Maddala, Lucas and Wallace 
(1962). Zellner and Geissel (1970) )propose both a maximum 
likelihood approach as well as a Bayesian approach. Their 
Bayesian approach differs from the approach outlined here 
in that they employ the adaptive expectations formulation 

and proceed to assume a diffuse prior on the constant of 
the adaptive expectations scheme. As discussed above, we 


propose a Bayesian expectations scheme which differs from 
theirs. 


The aggregate consumption function is specified 
according to the permanent income hypothesis, which states 
that the aggregate permanent consumption is a function of 
aggregate permanent income and is specified as 


Ch =k'Y: 


where Cc. is the aggregate permanent consumption, Ys iS 


the aggregate permanent income and ,* is the constant 


that depends upon several factors, including the rate of 
interest and the wealth of the group of consumers being 
considered. Permanent income represents the effect the 
factors that the consumer regards as determining his wealth: 
personal attributes of the consumer; attributes of his 
occupation; location of his economic activity; wealth. The 
permanent component of income is like the expected value 
of a probability distribution. The transitory component of 
income includes accidental or chance occurrences that add 
to the income of the consumer. Several authors have studied 
the consumption function using this approach. [See 
Friedman (1957), Zellner and Geissel (1970), Modigliani 
(1949).]° The permanent levels of consumption and income 
are not directly observed, and we express the consumption 
function in terms of observed quantities: measured 
consumption and measured income. 


We adopt a slightly different specification of the 
consumption function in that measured consumption is a 


| 


eb 
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function of permanent aggregate wage income anda 


o 
profit income, as follows: gegregate 


C= KY +P +u, 
for allt = 1, 2...,7, where C, is the measured real 


* ye 
consumption, Y, 1s the permanent aggregate wage income 


(unobserved) and P is the aggregate profit income, that is 
assumed to be actually observed. It is being assumed here 
that while permanent consumption depends on permanent 
wage income, it depends on measured profit income. This is 
because one would expect profit income to be more variable 
and unstable and hence permanent consumption would be 
linked to the actual observed profit income, as the consumer 
has little notion of a permanent concept of profit income. 
Alternatively, we could have specified permanent 
consumption to depend on permanent profit without altering 
the analysis. The parameters k, and k, are marginal 
propensities to consume and u, is the transitory component 
of consumption. 


This model exactly fits the model described in case (2). 
All the assumptions described in case (2) are assumed here, 
and by the same reasoning as before, we arrive ata 
distributed lag model as in equation (6.12); having the form: 


Ci. = Aue, +k (1 -A,)Y; + ky (Pray —A,P.) + Uy. 


[This follows by first differencing the specified consumption 


function and using the expectations scheme (developed 


earlier) on the unobserved Y,’ to simplify that term. In 


addition, (mg(Y,") ee is assumed to be proportional to 
observed y..] We will assume that v, is normally distributed 
and is not autocorrelated; y, is the actual, observed wage 
income. We consider a subset of Klein’s (1950) data used in 
the previous example in Chapter 3 on the U.S. economy for 
the years 1921-1929. The data will comprise of three national 
aggregates, viz., aggregate consumption, aggregate income 
from profit, and aggregate income from wages. [See Vinod 


and Ullah (1981)]. Now, to evaluate Aes we consider the 
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posterior density of A,, which is obtained as in case (1), 
section (6.1) as 
A /2+(T-2)/2), _ ) \A/2+(T-2)/2 
pale Masi 
[Cyl Ag) + Cy lae) + ae =A) yp — VE P24 2472 
t=] 


The prior parameters are chosen as follows: Cy = Cy = 4 and 


A=A' =4, [The motivation for choosing these prior 


parameters is the same as discussed in the example in 
Chapter 3; See section 3.2.4 of this eas Also, from the 
data we get 


T 
> & -¥)? = 827.28. 


t=1 


This enables us to write the posterior density as 
paz) « A> (1 — 2,)°> / [4 — A,) + 4(A,) + 4, (1 — A;)(827.28)]%. 


We obtain E[A,|data] by numerically integrating the integral 
corresponding to the first moment and dividing it by the 
constant of proportionality. We find E[A,\datal by numerical 
integration methods (after approximating the posterior 
density) to be 0.50 so that At = 0.50. Using this value of ree 


we rewrite the model as follows: 


Cy, = 0.50C, + O.50ky, +k,P* +, 


where P'=P,,-050P using the data and maximum 


likelihood techniques, we obtain the maximum likelihood 
estimate of k= 0.50k, and k, given by the estimated model 


Ci1 = 0.50C, + ky, +k) P* + vy, 


where k= 0.51 and k, = 0.6975. This gives us an estimated 
value of k, = 1.025, where kK, is the parameter in the original 
specification of the consumption function, 


sas pee ERI SEAN ey PMS 
PICS A 


Modelling of Expectations and Uncertainty 1 

~ at 

We now compare and present the OLS estj ! as 
as the Bayes estimates: estimates as well / 


1. OLS Estimates: The OLS Solution yields k;, 


(0) = L025, 


A 


k,(0) = 0.6975. The OLS estimates of k 


that the long run mpc from wa 
from profit income is 0.6 
consumption increases by on 
a one dollar increase in profit income, it increases by 
69 cents for a one dollar increase in wage income. 
The corresponding Short run mpc from wage income 
is 91 cents and that of profit income is 35 cents. 


2. Bayes Estimates: As explained in Chapter 3, Section 
3.1, we obtain the Bayes estimates in similar fashion 
as follows: using all notation developed earlier in 


, and k, imply 
ge income is 1.025 and 
975. Thus, Whereas 
e dollar and 2 cents for 


Chapter 3 and assuming the prior on 2, to be an 


inverted Wishart distribution with parameters D, 4 
and 2, where Dis approximately equal to xx so that 


~ 


A = 1,the posterior means of 7, and np are calculated 
to be 

E[m|y, X] = 0.0698 
and 

El naly, X] = 04075 


(obtained by numerical integration methods). 
Substituting the values of the posterior means of 


7, and 7, we get . 
k,(A) = 0.2263 x 0.6975 + 0.2447 x 1.025 - 0.2263 x : 
0.6 +.7553 x 0.7 = .8016 
and 

k,(A) = 0.1850 x 0.6975 + 0.1603 x 1.025 + .8150 ~ 
0.6 -.1603 x 0.7 = 0.67 

where the prior mean of k, is assumed to be 0.6 and 


the prior mean of k, is assumed to be 0.7. From this, 
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we see that the Bayes estimate of k,, the long run 
mpc is 1.025 and the short run estimate of mpc is 51 
cents. [These calculations are similar to those made 
in the example considered in Chapter 3; the reader is 
referred to that example for further details]. 


6.3: Bayesian Model of Duopoly Theory 


Cyert and DeGroot (1974) provide a framework for 
Bayesian analysis in the context of Duopoly theory by 
developing two simple adaptive models of firms in a duopoly. 
Models in which the two firms make their output decisions 
simultaneously and make their production decisions 
alternately are considered. In addition, the process of 
interfirm learning is examined and studied. [See Cyert and 
DeGroot (1973), Fried (1984).] 

Earlier approaches that were generally non-Bayesian were 
nonadaptive in character and did not include a firm's 
capacity to learn aspects of the decision making behaviour 
of its rival. The use of Bayesian analysis removes this defect. 
The Bayesian approach leads to adaptive models in which 
the firm is able to change its assumption about the way in 
which its rival will respond to any changes the firm makes 
in the decision variable. Thus, for each decision period, the 
firm can have a different reaction function chosen on the 
basis of the rival’s decisions over earlier periods of the 
process. We examine an adaptive model with a multiperiod 
horizon using a Bayesian analysis. | 

We consider the additive version of the model due to Cyert 
and DeGroot (1974). The reaction function considered in 
the model involves an unknown parameter and a random 
component. Consider a market comprising two firms 
producing the same commodity. Suppose the production 
process continues through n periods. Let q, denote the 
quantity to be produced for all j = 1, 2, ..., n by firm 1 in 
period j; let r, denote the quantity to be oreduced by firm 2 
in period j. Let Q, denote the profit of firm 1 in period J, 
where O- GP, for all j=1, 2,..., n, assuming no costs in this 
analysis. The demand curve in ie market faced by the firms 
is: 


fo 
di 
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p; =a~bqj — cr; 


for all j, where a, b, and care positive constants. The profit 
function for firm 1 is 


Q; = q;(a—- bq; -cr,). (6.18) 


It will be assumed that firm 2 chooses rafter knowing q, by 
employing the additive relationship 


rr=qjt+Ote; (6.19) 
for all j. If 6>0,firm 2 is trying to produce 4 more than 


firm 1. In all the n periods, @ is fixed and ¢ j represents the 


random part of the choice of firm 2 in period j, possibly 
reflecting the inability of firm 2 to control output accurately,. 
or, for example, adjustments in production level on the basis 
of recent information about market conditions from a survey. 


We assume that ¢,€9,...,€, are random variables that are 
independently and identically distributed normally with 
mean O and variance ,?. Firm 1 is aware of equation (6.19) 
representing the reaction function of firm 2 and the 
distribution of ¢; but does not know the exact value of @ 
that firm 2 is using. 


We shall assume a prior distribution on the parameter 


@ for firm 1. Suppose @ ~ N [m, o* ]. In this analysis, it will 


be assumed that the variances o? and o? are unknown. 


m 
This assumption is a departure from the analysis of Cyert 
and DeGroot who assume that these variances are known. 
In any given period, firm 1 chooses the value of q and 
Observes that firm 2 chooses r in response. By equation 
(6.19), we write 


ir, =@i)oN [0,07] 


for a given value of @. After firm 1 observes r, the posterior 
distribution of 9, 


p" (6) is N [m’, o’*] 
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by Bayes’ theorem, the posterior mean is 


aml oo 7 23 2 
m' =Am+(I A\(r; qj) 
given g* and of ,and the posterior variance is given by 


L/o'* =[1/07 41/07]. 


Since these variances are unknown, the posterior mean is 
evaluated as follows: 


m' = E[Aldata]m +(1- E[A|data])(r; -q;) 
where 
Azo? [[o? +07]. 


Now as seen in case (1), Section 6.1, E[Aldata] can be 


evaluated, and once this is done, the posterior mean jm’ 1s 
completely determined. 


6.3.1: Behaviour of FirmI 


It will be assumed that firm I will maximize its expected 
total profit over the n periods. Generally one would expect 
that the firm would maximize expected total profit over the 
n periods discounted over time, where the discount rate isa 
function of the interest rate and several other factors that 
the firm takes into account. However, in this particular 
model, the discount rate does not play a critical role, as the 
maximizing conditions over n periods reduce to maximizing 
in a single time period. Thus, we assume that the firm 


maximizes E[Q, + Qo+...+Q,] using a sequential decision 
rule. As discussed by Cyert and DeGroot (1974), the method 
of backward induction is used to find an optimal sequential 
rule. This is done by considering the final time period n 
when n — 1 periods are over. In the final time period, the 
firm decides q,,, the level of the output to be produced. Since 
the profits in the first n - 1 periods are already decided, 


only E(Q,,) needs to be maximized. 


a hea Pp > uO nm errr 
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Assuming that at the beginning of the ,,'" period, the 

prior on @ is 0~ N[m,o7]. Now, E[Q,,] is written as 
E[Qn] = aq, ~ bq? — CMQGy — cq, 
and is maximized when 
Qn = (a - cm) / 2(b +c); 
the optimizing value of E[Q,] corresponding to the choice 
of qg,1s 
E[Q,] = (a-em)* /4(b+¢). 


(This follows by maximizing E[Q,] with respect to q,, and 
obtaining the necessary first order condition and solving 
forq,. For that choice of q,, the sufficient second order 
condition is satisfied.) 


In time period n —- 1, the firm chooses the value of q,_; 


which maximizes E[Q,_,]+E[Q,]. Once more, at the 
beginning of time period n — 1, suppose the prior on 


6 ~ N[m,o”]. Given q,-},%-), for an optimal choice of q,, 


E[Q,] =(a-cm')’ /4(b +e), 


where m’ is the posterior mean of @ in period n-1. The 
distribution of 


- B fx 
Tra ~ N[(M+ Qy-1,0° + Om] 
and the posterior mean m’ is given by 


m'=Am+ (1 = A\Nn-1 - Qn-1) 


ay Me ae 7 veer e 
where = 0° /(o* +07,) given g* and o*. Substituting 


the Value of m', we obtain. 


E[Qn\fa-1O? 1 Om] = (a - cm)? / 4(b +c) +(1-A)Pc2g 
-2c(] 7 A\(a ~ cm) (7-4 ~ Q@n-\ ~ m) /4(b + c), 
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<2 
where g=(f-1 - Wn-1 7 mj? /4(b +). Thus, with 6% = 
(0? + Om), 


E[O,,|07, om] = (a- cm)? /4(b+c)+(1- A\?c26" /4(b+¢), 


since E(r,_; — Gn-1 —™) = 0. We write 


E[Q,] = (a-cm)* /4(b+¢)+c° Ell - A)? 5°*\data]. 


The posterior expectation, E[(l - 2} 6*|data], can be 
evaluated by the method described in Section 6.1, case (1). 
The posterior density of (4,6) 1s derived in equation (6.75), 
from which the required expectation can be evaluated using 
the methods described in Chapter 3 or by numerical 
integration methods. Suppose the expectation after 
evaluation equals f Then, 


E[Q,]=(a-cm)? /4(b+c)+c7f /4(b+¢) 
‘for any given choice of q,_,. Since E[Q,] does not depend 
on Qn» Wn-1) Can be chosen simply to maximize 
E[Qn-1 |, yielding 
An-1 = (acm) /4(b +c), E[Qn-1] = (a- em)” / 4(b +c), 
Therefore, optimally, 
E[Q,]+ E[Q,-.] = (a-em)? /4(b+c)+e7f /4(b +e) 


where m is the mean of the prior distribution of 9 at the 
beginning of the (n — 1)" period. As summarized by Cyert 
and DeGroot (1974), 


We can say that when the additive model is appropriate, 
the optimal sequential decision procedure is the myopic 
procedure whereby firm 1 makes its choice at each period 
without regard for future periods; that is , it makes a choice 
as uf each period were the final one, or as if it were in a 
one period problem. 7 
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The posterior distribution of 9 is updated in every period 
by the firm after observing the value of r. The Bayesian 
analysis presented above yields the outcome in a duopolistic 
market. This outcome differs from Cyert and DeGroot in 
that the uncertainty associated with the unknown variances 
are incorporated in the periodic updating of the posterior 
distribution of @. This would alter the expected profit level 
as well as the equilibrium level of output in the n period 
problem considered. 
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Appendix II, p. 336. The normality of y follows from the 
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normal random variables is also normal. 


6. The aggregate consumption function is discussed at great 
length together with its micreconomic foundations by Milton 
Friedman (1957), A Theory of the Consumption Function, 
Princeton Univ. Press, 1957, pp. 18-19, 115. In a study on 
consumption and savings behaviour, Franco Modigliani 
(1949), “Fluctuations in the Saving-Income ratio: A Problem 
of Economic Forecasting”, Studies in Income and Wealth, New 
York, NBER, 1949 reports several estimated regressions of 
the consumption function based on the permanent income 
hypothesis with the exception of results for Sweden; also, 
Zellner and Geissel (1970), “Analysis of Distributed Lag Models 
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with Applications to Consumption Function Estimation”, 
Econometrica, 38 (1970), assume a consumption function 
based on this approach and analyze and estimate it; see pp. 
865-867. | 


7 


A Model of Labour Supply 
Under Uncertainty 


7.1. Introduction 


The neoclassical model of labour supply is a certainty 
model with no stochastic aspects in its formulation, relying 
heavily upon the competitive market assumption to eliminate 
unemployment via flexible, market clearing wages.! With 
the persistence of high unemployment rates in most labour 
markets since the 1970s, disequilibrium models have 
become more pertinent in the study of labour markets. 


The existence of unemployment in labour markets leads 
to a situation of uncertainty for currently employed workers 
as well as for potential entrants in the labour market deciding 
to look for jobs. Labour supply decisions by workers in the 
forthcoming time period would be made after incorporating 
this uncertainty. Several models in the literature [Hartley 
and Revankar (1974), Sjoquist (1976), Yaniv (1979)] have 
proposed modified versions of the certainty model of labour 
supply that provide a microeconomic foundation for the 
individual’s labour Supply to depend on the unemployment 
rate. Another set of models in the literature analyse other 
terms of uncertainty on labour supply as well as on savings 
such as Snow and Warren (1986), Eaton and Rosen (1980). 
Hanson and Menezes (1978). Uncertainty in this context 
arises due to perceived prospects of unemployment on the 
part of workers in the present time period. 


The model considered here assumes that each worker 
can anticipate being employed or being unemployed in any 
time period with a given probability that will depend on the 
existing unemployment rate. The labor supply function is 
derived in an expected utility maximizing framework. 
Comparative static results are obtained which show the 
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dependence of the labour supply and the individual’s search 
intensity on the worker’s prior probability of unemployment. 
Section 7.2 presents the model and motivates the 
assumptions made in the model. Section 7.3 derives the 
labour supply function. A comparative static property of the 
labour supply is obtained that provides insight into the 
modified labour supply function. Macroeconomic 
implications of this analysis are discussed briefly in Section 
7.4. Conclusions form Section 7.9. 


7.2. The Model of Labour Supply 


Consider the labour supply decision in time period t of 
an individual who maximizes his expected utility, where 
utility is a function of income and leisure. Existing rates of 
unemployment, the nature and the state of the industry in 
which the worker works and business cycle fluctuations 
will all have varying impacts on the worker’s perceived 
prospects of unemplovment in time period t. 

Let P, denote the proportion of the total labour force in 
the economy who are willing to work but are unable to find 


jobs during the time period t so that 0< P, <1 which 1s 
assumed to be the unemployment rate that prevails in the 
economy in time period t. It is assumed that the prior 
distribution of P, is given by 


P, ~ Dif (s; T),0%] 


where f (se, r) is the prior mean and a, is the prior 


variance. 

The setting is a one time period, partial equilibrium 
analysis of the individual worker maximizing his expected 
utility. We will assume that the worker is risk averse and 
has a Von Neumann Morgenstern utility function that is 
twice differentiable having continuous first and second 
partials, U, U, > 0, U,, U,, < 0. In the present context, 
utility takes on a specific value in the state of employment 
with a certain probability and takes on another value in the 
state of unemployment with the associated probability. The 
expected utility is hence a discrete sum rather than an 
integral. This is written formally in equation (1) below. 


i 
; 
| 
1 
3 


wt 
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The individual worker maximizes the expected utility with 
respect to the hours of labour supplied as well as the search 
intensity: 


Max E(U;) = (1- f(s,,0)) UA (Y, + WNT - N,) 
N,,S; : 
+ f(8,,T) UP, +u,%-S,,N,) (7.1) 


where the notation and symbols are listed below: 
E(U) represents the expected utility in time period t. 


U/(.) represents the utility of the individual in time 
period tin state A (employed). 


U/() represents the utility of the individual in state 
B (unemployed). 


F(S,,T) 1s the worker’s subjective probability of 
unemployment in time period t. 


W, is the market (real) wage rate in time period t 
thought of as being non-stochastic. 


Y, is the real non-labour income in period t thought 
of as being non-stochastic. 


c is the total time in period t. 


CF is the real unemployment compensation in 
time period t. | 


N,; is the hours of labour supplied and is thought 
of as being non-stochastic. 

St is the search intensity of the worker in time | 
period t as a fraction of total hours of labour 
supplied. 

r is the risk parameter that allows uncertainty to 


be introduced in the model. 
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The framework of the raodel is that a basic state exists in 
which the worker enjoys his leisure with certainty; the utility 


that corresponds to this state is Ug. The worker enters the 


labour market if E(U;) > Up and is then confronted by this 


two state outcome discussed above. The worker maximizes 
his expected utility as stated in (7.1) at the beginning of the 


time period t and is assumed to know ¥Y;, U; For the 


currently employed worker, w, is the existing wage rate in 
the current job and for the new entrant in the labour market. 
w, would be the going market wage of jobs that he is trying 
to obtain. 


This framework combines the labour supply problem 
under uncertainty due to Sjoquist (1976), Yaniv (1979) with 
some features of Snow and Warren (1986), Eaton and Rosen 
(1980), although these later models analyze different sources 
of uncertainty and their impacts on labour supply and 
Saving. 

The model assumes that hours not worked are distinct 
from leisure that is, these hours could be spent in other 
activities such as searching for jobs. Thus, effort in State B 
is towards job search which is assumed to be a fraction of 

the hours worked in state A except that the wage rate is 
zero. This fraction is the search intensity of the worker in 


time period t, s; where 0 < S; <1.” This implies that leisure 
in state Bis (T; —S,N;). 
The model also assumes that the prior mean of the 


unemployment rate is assumed to have the form f(S;,I). 


| The dependence on S, constitutes the gain from search in 
that with increased search, the probability of unemployment 


Of | 
is lowered so that ae. =f, <0. The cost of search is 
t 


reflected in state B where increased search implies less 
leisure implying a lower utility in state B. But for the above 


assumption, there is no incentive to search. Further, f() 


teu pe ae oats. 


$b ue eH uM 
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depends on T, the risk parameter associated with the 


ynemployment rate. Thus, an increase in [induces an 
increase in risk about f(S,, 


ae 7 I')in the sense of Rothschild 
and Stiglitz (1970), that is, a mean preserving spread in the 


© ee near . O f 
distribution function for J (S52, 0). Thus, += fs 
t ) S, er fo 0. 


7.3. Analysis of Labour Supply in a Single Time Period 


For a given time period t, the worker maximizes 


E (U;) with respect to Ny; and S; so that the first order 
conditions are: 


SEU, 

oe =(1- f(S,, T)) uA w, 

-(i- f (s,,1)) UZ - f(S,,T) S, UF =0 (7. 2) 
SS =-f, Uf +f, UP + fi) UP (-N,) =0 (7. 3) 


The optimal labour supply and search intensity functions 
satisfy these two equations. The second order conditions 
fora maximum are given by 


U,,; <0, Ugg < O as the Hessian Matrix 


U; 1 Ui 
Uo, Ur 


> 0 and U,, <0 implies 


U1, <0. Up9 <O [See Henderson and Quandt (1988), 
p. 377], 


The expressions for D, = U,, and Dg =Ug2 are given in 
the appendix, Totally differentiate equation (2) to get: 
DAN, = fxd (-S) AT ~ foUam ~Ualal 
Then, | 


A 
AN, — fo BS, + foam - Vea) 
adY D; 
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From equation (7.2), we have 


A ee i B 
Ui, Ww, - Ur = => —- 8,0 19 


ea) 


Substituting this value in the above equation, we get, 


dN, _ fp 8 Us 
Gd Ae yy 


Which is positive since 


D, = -D, > 0. This comparative static result implies that 


as the risk parameter [ increases, the optimal labour supply 
also increases. Under risk aversion the worker hedges 
against this additional uncertainty by choosing to increase 
his labour effort. This result is consistent with a similar 
result by Yaniv (1979), Sjoquist (1976).° 


The comparative static exercise with respect to S, 1s 


carried out next. Totally differentiating equation (2), we get: 


« D,ds, = f,UZ(-N,) dt so that 


B 
as = 2 assuming fj} =Oin 
dl D2 


ds 
which case <a >O that is, for higher values of ;, the 


worker’s search also increases. 


7.4. Macroeconomic Implications 


The microeconomic analysis assumed that the worker is 
risk averse. Suppose we choose a specific measure of risk 
aversion, say, Decreasing Absolute Risk Aversion (DARA) 
due to Arrow (1965) and Pratt (1964) and assume that 
‘workers are risk averse in the sense of DARA. To draw any 
macroeconomic implications, we need to see how relevant 
this assumption is empirically. This assumption has been 
widely used in studies of portfolio and Saving behaviour as 
well as studies of the labour market behaviour under 
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uncertainty. Moreover, Cohn et al., (1975) and Projector and 
Weiss (1966) have presented empirical evidence that is 
consistent with DARA. 

If DARA is considered a reasonable assumption regarding 
attitudes towards risk, then our analysis implies that an 
increase in the uncertainty of future employment prospects 


would lead to an increase in the labour supply as well as 
search intensity. 


We briefly point out the macroeconomic implications of 
our analysis. Consider a standard IS-LM model of ageregate 
demand and assume a classical (perfectly inelastic) aggregate 
supply curve. As seen above, under reasonable assumptions 
about attitudes towards risk, an increase in future 
employment uncertainty increases both labour supply and 
search intensity. The increase in labour supply increases 
employment and hence increases output in the aggregate 
and further, lowers the price level. This implies that the 
ageregate supply curve shifts to the right. Further the LM 
curve shifts to the right due to the fall in the price level 
hereby decreasing the real interest rate. These conclusions 
are fairly similar to Snow and Warren (1986) arrived at ina 
different context. 


7.5. Conclusions 


The paper analyses the effect of increased uncertainty 
about future employment prospects on labour supply and 
search intensity. This is done in a one period expected utility 
maximizing framework. The expected utility is written by 
considering the utility in both the states of employment and 
unemployment. We find that as uncertainty of future 
employment prospects increases, the worker’s labour supply 
increases as well as his search intensity. This result is 
obtained by assuming that the worker is risk averse. 
Macroeconomic implications follow by making empirically 
plausible assumptions about the worker’s attitude towards 
risk. Empirical evidence is consistent with decreasing 
absolute risk aversion (DARA). If we assume DARA, then it 
follows that an increase in the uncertainty of future 
Employment will increase employment, output and lower 
the price level and the real rate of interest. 
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Notes | 

1. See Henderson and Quandt (1988), Microeconomic Theory, 
Third edition. McGraw Hill, N.Y. 

2. The leisure of the worker in state B is assumed to be (7; — S,N;) 
so that the maximum leisure enjoyed by the worker in state 


B = T, is = O ; corresponds to Hartley and Revankar (1976). 


The minimum leisure = 7, - N; [S; = 1; corresponds to the case 
discussed by Sjoquist (1976)]. The actual leisure of the worker 
thus depends on S; which depends on the individual worker’s 
motivation to search and other economy wide factors. 


4 


CN 
3. Sjoquist (1986) reports (p. 930, Eq. 5) that BU <9 which 
n 


later Yaniv (1979) argues as an unreasonable conclusion 
because of an unrealistic assumption. However, Sjoquist omits 
a negative sign in the numerator of eq. 5, p. 930 and if we 
correct for this we obtain 


ON 
CU, 


>O 


which is consistent with Yaniv (1979) as well as our results. 
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Unobservable Variables and 
Measurement Errors in 
Econometric Models 


8.1. Introduction 


Economists often work with datasets that consist of 
variables that are either not observed or not accurately 
measured. This paper focuses on the problem of estimating 
the unknown parameter vector in the linear regression 
context wren there is error in measuring the independent 
variables in addition to the fact that some variables may 
not be observed. The famous permanent income hypothesis 
due to Friedman (1957) illustrates the problem of 
unobservable variables: the permanent income of an 
individual not being observed. [See Feldstein (1974), 
Goldberger (1972), Griliches (1974), Levi (1973), Levi (1977), 
Nerlove (1971), Zellner (1970) for other examples in the 
literature. | | 


This paper proposes a Bayesian estimation procedure 
which takes into account the indeterminacy of the MLE 
approach (Judge et. al., (1980), p. 514) and generalizes the 
Bayesian approach of Zellner (1970). A class of informative 
priors is assumed on the variance of the error term and on 
the variance of the prior on the parameter vector. Inference 
with respect to these unknown variances enables us to 
evaluate the Bayes estimator in a novel way leading to a 
new estimator of the parameter vector. This estimator is 
Proved to be admissible. 

Section 8.2 presents the regression model with explicit 


assumptions that incorporate the measurement errors and 
the fact that some independent variables may not be 
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observed. The prior distribution assumed on f is discussed. 

The induced prior on a, the reparameterized vector is derived. 

In Section 8.3, alternative estimators of # in the literature 
are discussed together with the Bayes estimator of £. The 
properties of the Bayes estimator are presented in Theorem 1. 
Section 8.4 derives the posterior distribution of the unknown 
variance components. This in turn enables us to obtain the 
posterior distribution of a ‘shrinkage’ factor. (The Bayes 
estimator depends on this ‘shrinkage’ factor). We can now 
evaluate the Bayes estimator in a novel way. Conclusions 
are drawn in Section 8.5. 


8.2. The Linear Regression Model 


It is only in rare cases that the measurements of the 
independent variables are absolutely error-free. Generally, 
economic data derived from government or private 
publications report only significant digits. These rounding 
off errors are a likely cause of error in the measurement of 
variables. In addition, some variables may have inherent 
difficulties in their measurement. Some variables may not 
be observed at all. 


We will present and consider the problem of errors in 
variables and unobserved variables using the following model 


as an example:’ 


(1) y= XB+e 
X=X+V 
where: | 
y : T* 1 vector of observations on the dependent variable. 
x: Tx K matrix of the true stochastic, independent 


variables that are not accurately observed. 


x : Tx K matrix of observed explanatory variables, all of 
which are assumed to be stochastic, distributed 


independently of (, o”); the matrix x'xX 1s generally 


assumed to be non-singular. 


pf : Kx 1 vector of unknown parameters of the model, 


assumed to have a normal prior, V [u, og! i. 
e 


Unobseruc 


Bo Be sie 
N{ 
Vee Te 
an 


A mu 
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effects 
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1 3 Tx 1 vector of spherica] disturbances distributed with 
1. N[0,o7J}. 

"e 

re : V : Tx Kmatrix of measurement errors assumed to have 
> an N[0, 071] distribution,? 

ie A multivariate normal prior on # is assumed in this 
2S analysis. This prior represents the investigator's subjective 
w beliefs about Sand is essentially capturing his uncertainty 
1S about f. In the classical framework, this would be a random 


effects model. The prior on fis consistent with the 
assumption that fin fact is truly random and could have 
been drawn from a multivariate normal distribution. The 


1€ analysis here is in the tradition of random coefficient 
> regression models and is a generalization of work by B.M. 
te Hill (1967, 1969). : 
me A multivariate transformation that reparametrizes the 
= parameter vector fis considered. This transformation is a 
oe convenient intermediate step in the estimation of B. 
8.3. Description of the Transformation _ 
2 Consider the model in (1) and assume that X'X and yy 
de are positive definite. There exist orthogonal (K x K) matrices 
A,Ay such that | 
A'(X'X)A=A 
and 
A AL (VV)Ay = Ay: 
or Where A = Diag(2,,A......4,),4, for all iare the eigenvalues 
of x'%~ and Ay is the matrix of eigenvalues of yy. The* 
of transformation matrix A depends on X, and Ay depends on 
ced Vz Using this transformation, the model in (I) is 
lly Teparametrized as shown below. 
The model in (1) is now written as i | " 
otf 
jel, (2) y= Za+Uy+e ‘ 
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where: 


7-KXKAand ZZ =A, 


a=A'f, 
U=VA, and UU =A, 
y = Alp. 


The prior assumed on # induces a prior on a@ given the data. 
The moments of the prior on @ are obtained as follows: 


E(a|X) = A'u, Cov (a|X) = A(oZI)A = of! 


where 02 = oF: The prior distribution on a is N[A'u, 0721] 


conditional on X. 


. 3. The Bayes estimator of the unknown parameters a and 
8. Consider the reparametrized model in (2) given by 


y=2Z,+, + €). 


In tnis section, the focus is on the estimation of @ which 
forms the basis for estimating 8. The reparametrized model 
obtained from the transformation is a convenient 
intermediate step in the estimation of f. Similar econometric 
models have been analyzed in the literature both in a 
classical as well as a Bayesian framework. It is well known 
that the OLS estimator in such a context is not only biased 
but inconsistent. [See Judge et al. (1980), p. 514.] A maximum 
likelihood procedure is proposed (Judge et al. (1980), pp. 
518-531) which is not identified and does not yield an explicit 
solution to the estimation problem due to insufficient 
information. For a complete solution to the MLE approach, 
it is suggested that further information is needed on the 


variance component g? and/ora?. On both counts (the 


fact that the OLS estimator is baised and that the MLE 
approach is not identified), it is felt that a Bayesian approach 
is natural and justified in this case. 


The Bayes estimator of a in (2) is written as 


Vv. @(B) = Elaly, X,0%,0°,6] 


Uno 


whe 


Te ae aa 
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=W"'(2'Z /5)&(0) +(1-wz'z / O)JA'u 
where W 5tZZ /5+K), 
K =o" foot 


and 6 =(1+0% o? /0”). [See Bhat (1988), Ch.2 for an exact 


derivation of the Bayes estimator in the linear regression 
model.]$ 


The tr” component of this estimator is a;(B) = 7;a;(0) + 


A; , 
(l-7;)aju, i=1, 2,......K where % Wier rea oR’ 4% is the 7 
a 


column of A and o?, o2,5 are given. This analysis proceeds 
in Section 4 to assume priors on (o?,o? ,o”) and obtain the 


posterior density of (07,072,072). Further, the posterior 
density of 7; will be obtained. We evaluate @,(B) by taking 
the posterior expectation with respect to y7;: &;(B) = 
Eln; ldata] a; (0) + (1- E{n,|data]jaty. The following theorem 
establishes that a@,(B) is admissible for a;. 


Theorem 8.1: 


1. Consider the reparameterized linear regression model 
in (2) (where =0) with measurement errers and a 
normal prior on @ (the vector of coefficients). Let the 
Bayes estimator of a; be @;(B) for all i. Then, 


a;(B) = E{n,|data]a;(0) for alli=1, 2.....K 


where 77. = A,o2 /(0°6 + A,o2). a;(B) is admissible for 
a squared error loss function. 
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92. For the model specified in (1), the Bayes estimator of 8 
1S 


A(B) = Aa(B). 
which is admissible for f . 
Proof 
1. The first part of this theorem has been shown by writing 


out the Bayes estimator for a;. This estimator is seen | 


to be admissible by applying the results in Leamer 
(1978), p. 141 to model (2). 


2. Since # = Aa, it follows that the Bayes estimator of f 
is given by B(B) = Aa(B). As in 1, B(B) is admissible 


for f. Q.E.D. 
4. The posterior distribution of the unknown variance 


components (c”,o7,07) 
In this section, we derive the posterior density of the 


unknown variance components (o? 02,07). In turn, the 


posterior density of 7, (which depends on o”,02 and a2) 
can be obtained. E77, |data] is obtained which enables us to 


make operational the estimator a,(B) presented in Section 
8.3. 
For model (2), the likelihood function is a function of 


(a,0°): 
L(a,o7) « (o76)7” exp-1/ 07 d(y - za)’ (y - 2a) 
where a depends X and £. 
Incorporating the prior on a, it is easily seen that 
y = Za+Uy +6 ~ N[(0,=]* 
where® = Cov(y|X,V) = [071 + 02 ZZ' + 02.071).The likelihood 


function is written as a function of (a7 ,o? 
2 


a? 


2: 
o*,): 
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. Lie"67, &) oc > ue exp -1/2y ly. 
On simplification (see Appendix-G), 


ied 
Lo" ,07,05) &(075) T? 7 T+ ait soy? 
i=l 
xexp -1/2075[y’y - (t / dq'(t / OA + 1) “al 
A class of informative priors is assumed on the unknown 


variance components sarc we and ot, from the inverted 


gamma family of densities. Assuming that the priors on 


o*,o%, 0, are independent, the joint prior on (o?, 07,02) is 


pO" ,04,0%) & (07) 


MEX 1/2 (C5 fo HC, fo. FC, a). 


4/2-1; 22 \-A .,/2-1;_-2 =A, /2-1 ; 
eg) oes og) 
By the Bayes’ theorem of learning, the posterior density of 


D0 oD 
(0° ,0%,0,) is 


\ 2 2 2 2 2 2 
plo" ,07,0;) & Ha*,62,0,)0'(e";02,.6°) 
Thus, 
2 2 2 2\-T+A/2-1 2\-A 2-] 2,-A,,/2-1 
po" ,6,50;) 8a) (a2) 4a 62) 


x (6)? +t2 (5) *? exp -1/2[H,+H)] 


where 


t =07/0’, 

6 =(l+o? o? / 0”), 

Hy =Co /o° +Cy [oq tC, /o%, 

H, =1/o07d[y'y-t / d(1+ At / d)'q'q]. 

This posterior density is written for an important special 


Case when A; = Vi. 
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On further simplification and after transforming the 
variables, we obtain the posterior density of (t,6) as° 

Ay 7-4 [241 ; od a met 

p"(t,d) x t (6-1 7e/? Ws T4 (. + ta / 6) 

x (Cy +Cy /t+1/dy'y-t/d+ta/dy'a'a 


-K /2exp-1/2C,t /(5-1) 


~(T+2+A, /2) 


The parameter of interest being 7; =7 = At /5+At, we 
transform from the (t,d) space to the(7,6) space and the 
posterior density of 7 is stated in Theorem 8.2. 
Theorem 8.72: 

Given the regression model (2) and the assumption that 


A; = AM 1] 1; 25426 K, then the posterior density of 1 is 


p(n) oc te tatty — )Krta-4e!2°S Fn) 


where f(n) is given by: 


re) -A,-T/2+2 as es 
[54 (6-177?! exp -1/2C,n6 /(5-1)(1- 7] / 
pTtatsal2d and d =[Co + AC, (1- 7) / 6 +1/ dly’'y - 7/ Aq'q)). 


Proof 


p'(7) is obtained from p"(t,6) by transforming to the (7,6) 
space and then integrating with respect to 6. Q.E.D. 


Having obtained p"(7), we can obtain E[n|data] given by 


] 
J np" (nydn 


0 
which can be calculated by numerical integration methods. 
This enables us to compute the estimator a(B) and by 


Theorem 8.1, the estimator £(B) of £. 
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8.4. Conclusion 


The paper considers the linear regression model with 
measurement errors in the independent variables and/or 
some independent variables being unobserved. A new, 
operational version of the Bayes estimator is derived which 
is seen to be admissible. This estimator corresponds to a 
class of informative priors both on the unknown parameter 
vector as well as on the unknown variances. The posterior 
density of the shrinkage factor 7; is derived. The Bayes 


estimator is evaluated in a novel way with respect to the 
expectation of this posterior density. Further research could 
be done to develop computer packages to obtain E[ n;|datal 
given the data and the prior parameters. 


Notes 


1. The formulation here incorporates both the cases of 
measurement errors and/or unobservable variables. If the 


true variables y were observed (as X) with error given by V, 
then Y= XV. If X consisted of unobserved variables, 


then X=X+V_ where X is the set of observed proxy 
variables. 


2. The distributional assumption on Vis motivated by the fact 
that if the number of significant digits after the decimal point 
in the t independent variable is h (known), then the 
perturbation in that variable is = 1 /2(10)"" which is based 


on Wilkinson’s (1965) error bound for rounding off errors. 
Since the uniform distribution is often used in the context of 
rounding-off errors, we use the mean and the variance of the 
uniform variable but not it’s range. For a uniform variable 


ranging from -d to d, the mean = O and the variance = gq? /3. 
Thus, E(Vj) = Ovi, j, E(V;) =d’ /3 = 0% Vi, j. By adopting the 


larger range (-»,«), we allow for larger measurement errors 
than rounding-off errors. 
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3. The Bayes estimator here corresponds to the prior and hence 


the prior assumed on ff ~ N(u,0 41). More generally, the 


covariance of the prior on fis of the form LZ but this makes 


the analysis very complicated. The less general assumption 
of a scalar covariance matrix is made here to keep the analysis 
simple without any sacrifice in content. . 

4. This is because a linear combination of normal random 
variables is also normal. With the normal prior assumed on 
2; y and aware also normal in addition to ¢ (which is assumed 


to be normal). 
S. This follows by transforming from the (07,072,027) space to 
the (o?, t,5) Space and then integrating the resulting posterior 


density with respect to 42. 
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Conclusions 


In this final chapter, an account of the chief points made 
in this study is given, and a few ideas and topics are 
suggested for further research. The main results and 
conclusions are emphasized at the ends of the chapters, 
but the overall summary here will provide a broader 
perspective for the study as a whole. 


9.1 Summary 


This study proposed the application of the Bayesian 
standpoint and approach to economics and econometric 
methods. The focus of this study was in demonstrating the 
useiulness and applicability of Bayesian methods in - 
econometrics and economic theory. We began with a review 
of Classical inference in econometrics and contrasted it with 
Bayesian inference as applicable in econometrics. In 
particular, the general linear model was taken as an example 
of an important tool in econometric research and inference, 
and in this context both Classical and Bayesian inference 
were evaluated. Using a linear model, economic hypotheses 
important from the point of view of policy-making are 
represented. Hypotheses in question are then tested using 
available data and econometric methods are applied to the 
general linear model (of regression). With the axiom of correct 
specification being questioned, classical inference based on 
ordinary least squares estimation is not justified. In addition, 
in a non-experimental science like economics, data sets are 
often weak and not very informative. In such instances, a 
formal incorporation of prior information is found to be 
useful. 
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We contrasted and compared the Bayesian and Classical 
approaches at some length. In addition, we have discussed 
briefly ‘the foundations and implications of the Bayesian 
approach and the potential for its application to econometrics 

_and economic theory. A brief review of some basic concepts 
_ as well as the analytical framework of Bayesian inference in 
relation to estimation theory in econometrics was presented. 
Important results relating to the admissibility of estimators 
were summarized. 
A complete Bayesian analysis was discussed in the 
context of the linear model. The posterior distributions of 
the ‘shrinkage’ factors of the ridge estimators were derived 
by assuming priors on the unknown parameters. The most 
general case was difficult to analyse as it involved working 
with zonal polynomials. A special case was considered in 
which we obtain the Bayes estimator for the linear model. 
This was seen to be admissible. In addition, the ridge 
estimator is seen to be equivalent to the Bayes estimator. A 
modified version of principal component regression was 
discussed. An aggregate consumption function was 
considered to illustrate the methodology discussed and the 
OLS estimators and the ridge estimators were compared 
using a data set that was weak and multicollinear. We also 
plotted the likelihood function of the shrinkage factors and 
compared it to the posterior density of the shrinkage factors 
in the case of two variables. The methodology in the 
multivariate case was based on the creative and insightful 
work of Hill (1977) in the context of the one-way balanced 
random effects model. It was seen that the posterior density 
obtained in the context of the multivariate general model 
was comparable to the density derived by Hill (1977) and as 
discussed, it is seen that our multivariate posterior density 
is a generalization of the density due to Hill (1977) in the 
case of the one way balanced model. 


Other variants of the linear model were considered and 
the analysis of the earlier chapter was extended to these 
situations. We considered the case of the linear model with 
autocorrelated disturbances (where the serial correlation was 
assumed to be of the first order), and also the linear model 
with an explicit assumption of errors in measurement of 
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the variables. The analysis in its most general form could 

not be carried to completion, for a number of reasons. Firstly, 

in the general linear model, the transformation of the linear 

model to its reparametrized form imposes certain restrictions 
on the reparametrized coefficient vector and the prior that 
is induced on it. These restrictions alter the form of the 
posterior density of the shrinkage factors making it less 
elegant than it would have been without these induced 
restrictions on account of the transformation. As seen earlier, 
the induced prior in the most general case involves the study 
of zonal polynomials, with the result that the analysis 
becomes very complicated. 


Secondly, in the model with autocorrelated disturbances, 
we needed to make an assumption on the covariance matrix 
to proceed with our analysis. Also, the most general case in 
the model with errors in measurement was found difficult 
to analyse. The prior covariance matrix of the coefficient 
vector was assumed to be proportional to the identity matrix. 
This is not very general although our analysis proceeds in 
the presence of errors in measurement of the independent 
variables. Lastly, in all the models considered, the posterior 
density of the shrinkage factors was expanded using 
multinomial expansions and logarithmic transformations. 
Thus, an important limitation of this analysis is that in 
general, without making any explicit assumption on these 
eigenvalues, we cannot express the posterior density in the 
general form that we do. Numerical integration methods 
could be employed to work with the most general form of 
the posterior density. 


The second segment of the thesis consisted of applications 
of the inference structure developed earlier to economic 
theory. A Bayesian expectations scheme was presented that 
made use of prior information. The adaptive expectations 
scheme was seen to be a special case of the Bayesian variant. 
The version developed here, unlike Turnovsky’s (1974) 
approach, assumed that the market participants as well as 
the econometricians were unaware of the variances that were 
involved in the decision making processes. A consumption 
function example was discussed based on the permanent 
income hypothesis. Bayes estimates were computed and 
reported with the OLS estimates. 
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A simple duopoly model was presented where firms made 
use of prior information in their decisionmaking. Labour 
supply under uncertainty was also considered from the 
Bayesian point of view. Labour supply was seen to be 
dependent on unemployment and other factors. Utilizing 
an expected utility maximizing framework, the worker was 
confronted with a stochastic labour-leisure constraint, which 
incorporated the prior knowledge of the worker about the 
prospects of his employment in a given time period. This 
was the demarcating feature from earlier analyses of Hartley 
and Revankar (1974), Sjoquist (1976) and Yaniv (1979). 


9.2 Scope for Further Research 


We conclude with an outline of areas in which research 
can be pursued on the lines of this study. The following 
themes suggest themselves: 


(1) In the Bayesian inference methods developed here, 
more models could be considered, such as the 
simultaneous equations model or a pooled cross 
section-time series model that would have direct 
bearing on economic research. In the models 
considered here, the posterior densities obtained could 
be studied with respect to the choice of the prior 
parameters. These sensitivity analyses could be critical 
and would suggest guidelines in the choice of the prior 
parameters. 


Also, it would be worthwhile to study the behaviour of 
the posterior density of the shrinkage factors as the sample 
size of the data set is increased. As pointed out earlier, the 
most general form of the prior induced on a could be studied 
and conditions obtained under which the analysis would go 
through. This would provide an extension of this analysis. 


Alternatively, we could have analyzed the untransformed 
model given in (3.1) and directly evaluated the shrinkage 


factors as E|(xX*« + K)""|y,X], thereby obtaining the Bayes 


estimator of #. The term (x'x + K)"! could be expanded in a 


power series and the expectation could be obtained by 
making assumptions on K. This could yield interesting 
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results and could be compared to 


th 
this study. © results obtained in 


2) Applications to other ec ; 

( re a nou models in other fields 
would lollow naturally from a study 
aoeetheeee ee study such as ours. 

yp Sed on the consumption function 

examples could be tested, and different Specifications 
of the consumption function could be estimated. The 
Baye sian variant of the expectations hypotheses could 
be applied to other contexts where the formation of 
expectations of economic agents is important. 


Games with incomplete information could be developed 
in the context of Oligopolistic modelling using a Bayesian 
approach, as well as following Dov Fried (1984); other 
variations could be developed wherein one could make 
alternate assumptions about the cost functions of the 
oligopolist in the market. The modified labour supply 
function could be empirically tested using data on the 
significance of the dependence of the individual’s labour 
supply on the mean of the prior rate of unemployment. Other 
comparative static exercises could be pursued and followed 
in this framework. 


Appendices 


Appendix- A 
Simplification of the Posterior Density in Chapter 3 


This appendix considers in detail the further 
simplification of the posterior density in Chapter 3 given by 
equation 3.3. This posterior density is | 


K . : 
mp2 2 2 s[-1/2) 2)-4/2-1 2 \-(v+K+1)/2 
po? 08 nh, ) x [Bf (ot ATT o2, peeeey 

i=] 


K 
x] [(o%, +102, )exP-1/2[Cy Jo? +yE y+ D4 fo? |, 


i<j t=] 
given y, X and the transformation matrix A. On writing the 


explicit expression for £, we obtain the posterior density as 


-1/2 os = 
p"(a io pies ) «| +2Z7Z" / (o?) (T+A)/2-1 


1<j 


K 

[ez es «T](2,.- 04, )exp-1/207[Co + 
i=] | 

Yo 


22 [oz + H(y,Z)I, 


Hly,Z) =y'(I + Z7Z')'y. 
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The term H (y, Z) is simplified as 
y'(I+ZI1Z'\y =y'y-y'Z+T") ZY. 
[See result T8, Appendix I of Leamer (1978), p. 324.] 
Next, we consider [J + Z1Z and express this determinant 


in terms of the elements of T. The matrix 77z' and 777 = T 
have the same set of eigenvalues except for a set of zeros 
[See Theil (1971), p. 26, G 8; this has been used here.]. 
Since Tis diagonal, the eigenvalues of T are the elements of 


T. Thus the first K eigenvalues of 777’ are oO a lo, 
i=l, 2...,K, and the remaining (T- K) are all zero. The first 
Keigenvalues of (I + ZTZ') are therefore (1+ oe / 07), where 


i= 1,2,..., K, and the remaining (T- K) are zero. If Ais the 
matrix of eigenvectors for (I + ZTZ'), then 


A'OA=A 
and 

A'A=I, 
where 
Q = (1+ 212’), 
A = Diag [(1+ 0%, /07),(1+0%, /07),...(l+o%, /o7), 1,1, I]. 
Thus, 


K 
[4’Q4| =|A] = | [0+ 02, / 07). 
i=] 
But, 4’24) = |o|4’4)- |Ql-|Q. Thus, |4'Q4|=|Q)= 


K 
2 2 
I] (1 + Ca; /o )-Thus, we have the desired result [I + ZTZ'] = 


K 
[ Jas 0%, 07). 
i=l 


li 


the 
fur 


nul 


go 1 


[Qo 
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Using this result, the posterior density is written as 


" 2 2 2\—(T 4. ae i 
p'(o TG, v9 OG) K (a7) PH4)/2 'T [a+ 02 /o)-¥2 


i=] 
2 \-(v+K+1)/2 2 5 
(Fa, ) ‘. [ [te2, ~Pq 


i<j 


K 
Jexp -1/2071Q) + 9074 jo? 


-H(q,T)], . 


where = q=Z'y,Qo = y'y+Cp and H(q,T) = q'(I +T")1Q, 


This result will be used in Chapter 3. 


a Appendix- B 


Validity of the Expansion of M (7715 7/29+++9 7x) 
The logarithmic expansion written for (m(71, 72 a 
/ mo)is 
m(7 9D]Q 1-094) ) — 1 ' 
Inf 222 SB) = —DfL — (725-5) / Moll i, 
[=] 


Mo F 


0 < m(71,795---,7K) < Mp. We are ignoring all points where 
the function m(7,,79,-..,.7«) Vanishes or becomes zero. The 
function m(7,,72,.5.%7K)=9 occurs only when the 
numerator Q(7,7,---,%K) = 0, as the denominator does not 


go to o, The numerator Q(71,725-:,%K) is QM, M2.-507K) = 


K K 
[Qo + » All —n,)/1i - > qn; ]- This term is rewritten as 


i=] i=l 
K tes 
Q(7157a9++97K) = [Qo] | Ni + > Al -n)] [ 7; 


i=l i=] jel 
K 

-) qi ni Il 7 j|- 
i=1 


jzi 
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K 
Thus, Ql» Mar IK) = Ve t > 7; where: 


K K 
Ve = I] 1;:Q0 -\) nar | | Njs 
i=l i=l jai 
= (A(l - nd) | nj). 
j#i 
Now, 


T, =Oifn; =lory7, = 0 for any j #1 
assuming that 750 for alli=1, 2,..., K. Also, V,= O when 
n, =0, for any i and V;, 2 O otherwise. Thus, . 


Ol), M95--.7K)=0> 7, = 0 for alliand V; = 0. 


Hence, 
Q(7137a9-+9 7K) =) => 7, = O for all i =1, PRNTE Ee 
Conversely, when 7, =O for allt, Ql» Ma9--> 7K) = 0. Thus, 


M(71)7o.»7K) = O if 7; = O for all t. 
Thus, the logarithmic expansion employed | 1S vaild for 


n, €(0, l)for all i The set 7; =0 for all iis ee while 
aes the eae 


Apewdix: Cc 


Validity of the Multinomial Expansion for an Infinite 
Sum 


In this appendix, we show that we are able to expand an 


infinite sum using a multinomial expansion for a finite sum. 
For each positive integer i, let . 


= (a") /i, 


Cat) i gs 


since gi SEARLE ESOT 


NS a RP a SNOPES EIEIO AEA TES 


a Ra TRAN ACTA EE: 


ee en ne oe ee cah 


ese, Ly eee 
——_ 
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where, 


O<a= [1 — (M7), 79,.-657K) / Mo)] <1. 


; . k 
We are interested in obtaining an expansion for (> Xi)". 
i=] 
According to a well known formula (see L.B.W. Jolley, 
Summation of Series, second revised edition, p. 20, Dover 
Publication Inc., New York, 1961), 


X; = —In(l- @) = lin(1 i a)| = In(mo / m(71,7725---57K))- 


i=] 
Call this sum S, and let 
Sn = X1 + Xot...4+X py 


for each positive integer m. Let k be a positive integer. As 
m—>o, we have S,, + S, and hence | 


S36" 
by the continuity of the function x*, that is, 


oO m 


2, Xj)* = S* = lim (S,.)° = lim (1 XJ". 


t=] 


According to the multinomial theorem, 
m k m 
i 
(> Xi) = DS RP [xh 7G)! 
i=i (,}=0 i=l 
where l is a sequence of natural numbers satisfying 
m 
O<1; <kforalli,) 1, =k. 
i=1 


This sum can be rewritten as 


Yel] [x /(l;)! 


leP, - i=l 
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where P_ is the set of all infinite sequences 
l — (isbositgl aed) 


of non-negative integers such that natural numbers 


satisfying be = kand l= 0 for alli > m. This is because 
i=l 
X} /()!=(%)° /O!=1 
for all i> m, so that 


m 


II! /(U)!=[ [Xi Gs 


i=] i=] 


and because the elements | of P, are in one-to-one 
correspondence with the m-tuples (l,, lL, ..., 1) 


satisfying 0 <1, <k for all i, ) I; = k. 
t=] 


The analogous multinomial expansion for the 


00 
infinite sum >) X; is 
1=] 


(>, X)* = Te] [eu / ys (1) 
1=] 


leP t=] 
where Pis the set of all infinite sequences 
b= (ls dh, ats) 


2? 
of non-negative integers such that I, +l,+---=k. If 


leP, thenleP, for all sufficiently large m. Since the sum 


on the right side of equation (1) consists of positive terms, 
the order of summation is immaterial and we can write 


Se] 1x)! / (l;)! = lim Dep [oss / (li)! 
i=l me leP, i=l 


leP 


Ar 
Tr 


TI 
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This shows that 


z k ; = 
lim (X, +Xgt+--+X,)* = lim s* = gt =()°X,)*. 


m—->>% 


This proves the equality in equation (1) given by 


xi Se POR! / (i). 


leP i=] 


Appendix- D 


Expansion of m(7,7,. v9 Tg) Further Simplification 


In this appendix, we consider in detail the further 
simplification of m(7,, 75, 


-..;7«). We begin with the LHS as 
expressed as 
; . 


>A /2)* (m9 - m= 
LHS = > a 


k=0 Lik 
is [ 13 Mo 
i=l 


ul 


1 


m 


: n 
Using the multinomial expansion for > Xi); we write 
i=l 


an faye as follows: 
st 


ees . K ~ ‘ 
say CVU M6}"[Q + SAN) / m5: - G2)” 
(my -mj =P = 


. in |A as | 
Ui3=0 1 eM Te 7; In? (TT O(n) 


J=itl 


t=] 
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where _ . 1s finite, since only a finite number of l are 


Daal 


i=] 


non zero (because yh = k).Thus, 


i=] 


5 an] [wig re! qe ee diy 


i je 
(rm, - my = y 2 ee 
Ui}=0 dj=0 h=0 ; Aldo I(r) (Q)) (ay T Tou (77;)) 2 T Ta !d,>) 


i=] i=] 


BJ 


2K 
where (). > dj) +do = jg. This simplification follows from 
jal i=l 


the binomial expansion of Il (l- 7m /7 ee > which is given 


jeitl 


by 


where 9h = 252 / i}; 
The expression for LHS is now modified: 
k=0 {l}=0 {j}=0 dj=0 h=0 


an] [tet 2p i142 (1) ea — 


t=] 
? 


Jo 


x 


fi Vd \(r79)"* (Qy) Ite: J [ar*0 (4) Ie dig!) 


i=] i=] 
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where q;, = 1, li! (m,)'. Thus, we have 


ty iw 


fms tass Mele? oe », > » 


k=O {l}=0 { aes =0 he=O 
2d, _dj;-djg+(i-2)h ; 
an (A /2)*i'(—1)42*2* [Is ee ogy 
ie | 2 tole are eine Pca Sa t= 


oo K 
jy 'do (mg) (Qo)? | | te) I iy (O:(7; TT (d; !p; !) 
i=] 


i=] 


We will use this expression for [m(7,,72,...,7)| in Chapter 
3 for further analysis. 


_Appendix- E 


Study of f (A, B, C; z,, z,): Extension of the 
Hypergeometric Function , | 


In this Appendix, we present and discuss the results re- 
lating to the generalization of the hypergeometric function 
due to Hill (1977). These results are applied and used in 
this study in several contexts. 

The chief concern here is in Seung with the moments 
of the posterior distribution of 7, which depend on 


f(A, B, C; 21, Zg). Hill (1977) shows that f(A, B, C; 2), 2) 
can be viewed as a generalization of the hypergeometric func- 


tion studied by Appell. [See Bailey (1935).] The 
hypergeometric function is defined as 


Pigeaeiel se 
Ixc  1x2xex(c+1) 


This is equivalent to 


F(a,b,c;z) = —— ja 
2)= re b) bl xz)* 


R 
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The generalization of this is given by 


Poa Olea ig 


f(A, B,C; 21,22) = [a 2 x)= Zo(1 ~ xe 


C.1 Relationship between F(a, b, c; z) and f(A, B, C; z,, z,): | 


1. If z,= 0, then fA, B, C; z,, Z,) reduces to 


T(A+1)r(B +1) 
A, B,C; so FIC AL ATB + 2;2) |. 
F( >») 5Z1,Z) T(A+B +2) [ i] 
If z, = 0, then we have 
fA ey oj— eo EE er aee soe). 
(A+ B+2) 


If both z, = 0 and z, = 0, then 


_ PA+i(B+)) 
Nees et aay B+2) 


with the posterior distribution of 7 collapsing to a # 
distribution. This implies that when one or both of z,, z, 
equal zero, which is equivalent to the roots of the 


quadratic equation |x| =o,then the integrals 
f(A, B, C; z,, 2) are multiples of the hypergeometric 
function. If any one of z; > 1, then f (A, B, C; z,, z,) must 
be analyzed as z; — 1, since the series is divergent. 


2. f(A, B, C; Z,, Z,) lends itself to an interpretation in terms 
of integrated generating functions. The generating 
function of a random variable X taking only positive 
values is 


m(t) = E(t"), O<t<1. 
For any a>O, b=0O, 
fe (1-t)’m(t)dt = E [ere 2) at. 


This is further expressed as 
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[eta —t)’m(t)dt = (b+ 1) E[(a+x)...(a+x«+b)} 1. | 


By suitably defining X to be a random variable with a 
negative binomial distribution and by using the result 
stated above, Hill (1977) shows that 

fABC 242.) 0. po BX SARK eas ee yy 
where B is an integer. Then the posterior mean of n. As 
seen to be 

ate E[(X, + A+2)... (X,+A+B+2)]" 

E[(X+A+l)... (X+A+B+)]} 
_ where X, is defined like X. 


3. Further results express f(A, B, C, z,, Z,) as a finite or an 
infinite sum: Let 2° = z,+ Z,-2,z,, p’ = 1-z = p,p,. Then, 


En; 


1~2? 


+]- = 1 C-H 1 
fABC 21,25) =p5" “yc | Je 
t=] : 


1 
x [tera -t)2(1-2°ty dt 
10) 


| 

| 

where H= A+ B+ 2- Cand the summation terminates | 
at (C-— H) if it is an integer. Alternatively, 


C+i-1 


f(A, B,C; 2,22) = peeves | ; 


(poz,t)! 

i=l in 

1 in 
x [tAa- oP = 2th at. ! 
0 


4. Let D= C- H. When Band Dare integers, 


D 

gg ({D) ; 

SAB Ce52)2p, ‘Dew (Tes 
j=0 


B 
x DCM eae: At j4i40-1). 
i=O0 
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When z, = 0, asa special case, 


pet zi At BH 18s 


B 
x > un(; ie y(Z;A,H -i-1), 


1=0 


f(A, B,C; 2,22) = P 


where B is an integer. The function y(-) 1s defined as 
follows: 


iam n= lx" dx, 0<2° <1, k>-l, 


ris arbitrary. 
For properties of the function y(-) and proofs of results 
in III and IV, see Hill (1977). 

C.2: Approximate and Exact Results in Obtaining 


E(7i|X, y): 


1 fp? 4 poe" 3-0, then 
B+] - 
Blinly. X1= No | 


2: If p47?) Deas e —+ 0, then 


P2(A+]) 


Elni|y, x] (GLA=o), 


3. Combining both these asymptotic values and taking 
one p,to be small and the other large, 


Eln|X.y] = (A+1)+(C-—B-2)z,/p, 
(A+ B+2)+(C- B-2)2,/p,+(C- A-2)zy/po 


4. For any 0 <z,< 1, O<z, <1, the lower and upper 


bounds for E|n, 


y,X| are: 


P,(At+l/(A+B+2)< El nly, X] <1-[p,(B+1)/(A+ B+ 2)} 
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5. When 4, B, C, and D = (C-~ H) are integers, letting 
A+ B+ D=G,and B+1=E, D-1 =F, we have 


G 
Ye Tp rec -1-1) 


Bop ij ys 2 SM ig a 

> (-1)'[ puro (Cape 
[=O 

EF 

E Sol tp 

i=O0  j=0 LAIALwIt AJ 

F Seo P 4 em 

i=O j=0 1 ees 

> where 6 = Zo / Po. 


This provides a summary of the results that are 
applicable. The reader is referred to Hill (1977) for more 
detailed explanations and proofs. In situations where A, 
B, C, and D are integers, using result (5), we could 


calculate E[nily. X| Results (1) - (4) yield approximate 


results for E[nly, | and could be useful when the 


approximations are good. In cases where not all of A, B, 
C, and D are integers, numerical integration techniques 


could be employed to give values for E|nily,X] 


Appendix- F 


Simplification of Posterior Distributions in Chapter 5 


_ In this appendix, we consider simplification of posterior 
distributions and related functions discussed in detail in 
Chapter 5, 


Lito: 
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F.1: Simplification of the Likelihood Function in Section 
5.1.1: 


The likehhood function considered here is 


L(o? senige sor yee ae exp-1/2[y'Z‘y] 


a? aK 
where > is as defined in section 5.1.1. Also, since 
L=0° /p'[1+ZIZ'), 
we get 


-1/2 


by 3 (02) T(p*\ 21 + 2TZ'V M2. 


Using the result proved in Appendix A, we can see that 


- 

oy ee 
the term |J + ZTZ']"'/? is equal to at +p oy, fay. 
i=1 


Using these results, we modify the likelihood function as 
* oT K a 
Loar Faye x (oO VP] [eos fo? 
i=] 
xexp - (1/207 [y'y- p'(I+T')" pl) 


using result T8 in Appendix I, p. 324 of Leamer (1978) which 
states that 


y'(I+Z7Z')'y=y'y- p'l+T')'p, 
where p= Zy. 


F, 2: Simplification of the Likelihood Function in Section 
5.2.1: 


The likelihood function is 
Lo’, 03,02) ||" exp -1/2[y'E"y), 


where > was defined earlier. Now, Since © = Cov (y|X, V)= 


lo’ +o0%ZZ' +0 shore AP 


ee rer a eee] 


~ a oe en femlloh ae Rist jal nuicartattye 


endices 
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we get 


-1/2 Bord 

a] = (08) “ret poze tM? 
which can be rewritten as 
-1/2 K 


= (0°) TPT T+ t /6a,)-¥2, 
i=] 


2 


where t = ow io, using the result proved in Appendix A. 
Using these results, we modify the likelihood function as 


K 
Lo" 03,05) & (oy?) [ast / da; 
i=] 
xexp (-1/ 2075 [y'(I +t /dZZ')"1y)). 
This is further expressed as 


K 
L(o*,0%,0) « (076) 7] [+t / ay? 
i=1 


xexp —1/20*d[y'y-(t / d)q'(t / 6A +1)" q] 


where using result T8 in Appendix I of Leamer (1978), p. 
324, we write 


y'(I+t/dZZ')'y = y'y-(t/djq(I +t / dA) Gg, 
where q = Z'y. This result will be used in section 5.2.1. 


F.3: Simplification of the Posterior Distribution in 
Section 5.2.1: 


We consider the posterior density of (o7,07,h)and 


proceed to simplify it further using multinomial expansions. 


The posterior density of (0? ,o? ,h)is 


p"(o* 02 ,h) oc gt (T+A)/2-1 (o2)742 sae (6 a! h) / th)“ {2-1 nile 


(1+ tth)*/? x exp-1/20°7[Cy +C, /t+o°C,th /(1-h)+ H]] 
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where 


K 
Hy = h(y'y -th /(1+th2)> q?). 
i=l 
Consider the following term in the posterior, which is 
expanded using an exponential series as 


K 
exp -1/2 fost janie nv fo ad 


= > \(-1)"(C,th / (L—h) + hQg - gQ)" / 2m! 


m=0 


K 

where $=[th® /o*(1+tht),Q =) q? and Qp =(y'y)/o”. 
i=] 

Using the multinomial expansion, we expand the term 


(Cth /(1-h) +hQy - ¢Q)™-as 


m 


> (m)!'(-1)?(C, th / (1 A)]" [GQ]? [hg] / (h) Mo) Mls)! 


(Uj j=0 


where |. > 0 and yy ee 

j=l 
Also, applying the binomial expansion to the term (1 - hy) ; 
we get the following : 


(1 = 7 ie = >» -1)'q,h* 
k=0 
where 


i Me a | 
ax -(™ fe 


cicero 


ndi 
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1 . < . 
Also, we expand ¢” in a similar way and write 


gb? = Pane (Ay? / (co? \'2 
v=0 
and 


eo 


(1+ Athy"? = 5" p, thi)’ 


v=0 


where these binomial expansions are valid when {t <1, 
which will be assumed. 
Thus, using these expansions, we write 


(1+ tha) */*(1-h)y*/?" exp -1/2C,th /(1-h) +hQy 


___ tht : q?] 
Fam 
s Ss > 5 u()#(C,)! (pire aera ] (0?) 2*s 


u = (-1)"***2 a. g,p, (A)” (y'y)® / (2h Mo Ny !). 


Using these expansions, the posterior density of (o?, 07, h) 


is written as 
p'(o? ,ogsh) x Z yuo WBN Qyr(C, "As (og) “22 
xexp-1/2[Cy /o? +C, /o2 +h/o’y'y] 
where 
Lp = yy > > Ss 


m=0 {I,}=0 k=0 v=0 
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for notational convenience 

and wheren = 2v¢4+1, +l, +a, /2+1, d= (T + Aight 
2+k+ 1 +2l, +1, +2u +. 

On integrating with respect to h, we obtain the posterior 


density of (o”,o2) as 
pie eat ae OOO) (Cie, 
x €Xp -1/2[Co / 0? aor copa 


where on integration with respect to h, a constant of 
integration is obtained equal to A(d + 1,1). This posterior of 


(07,07) is referred to in the discussion in Section 5.2.1. 


Appendix-G 


The Expressions for U,,, U,, in Chapter 7 


The expressions for U,;,, U2, used in oe 7 are 
given in this appendix. : 
The first order conditions as seen are t 


(1) =(1-f) (Ui, - Ua) ~ IS: Ua = = : 
(2) Us =f, (UP -U?)-fN, Uy =0 
Differentiating (1) with respect to N; we get 


Uy, = (1 f) UA wP)-2 0 - f) Ufa we + (L- f) Usa + PU Be 
Similarly, we get 
Une = fir UP -UP)-2fU2 Ny + fUra2 N?. 


These are the expressions that were required in Section 7.3. 
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Appendix-H 
Likelihood Function in Chapter 8 


The likelihood function being considered is 
L(o* 04,07) © [2] exp-1/2 yy. 
where ¥ was defined earlier. Now, since 
2 = Cov (YX,V) = 071 + 02 ZZ' +02 o?, 
we get 


fap? = (Pay Pir et sozet? 


which can be rewritten as 


K 
Ep = ay"? ] Tat pay 
t=] 


where t=o0% /o*, using the fact that zz’ and 77 have 


identical eigenvalues except for a set of zero eigenvalues. 
Using these results, we modify the likelihood function as 


K 
L(o*, 03,07) « (076) 7? ] [+t /oa,y? 
t=] 
xexp —1/207d(y'(I+t /5ZZ'\y). 


This is further expressed as 


K 
L(o”, 02,04) « (076) "?] [+t / oy 
i=] 


xexp -1/207d[y'y-(t/d)q'{t /6A+1)q] 


where using result T8 in Appendix I of Leamer (1978), 
Pp. 234, we write 


y'(l+t /62ZZ')y = y'y-(t /d)q'(l +t / dA) ‘q, 
Where q= Z 'Y. 
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